Spiria logo.

7 amazing ways to apply deep learning

August 16, 2018.

Deep Learning is capturing the imagination of programmers and stimulating their creativity, particularly in the fields of image and sound processing. Here are seven especially exciting applications, some of which are milestones in the history of artificial intelligence.

Estimating human poses in real time

Launched in 2016, this convolutional neural network-based tool detects human figures in a video and estimates their pose in real time. The research team at Carnegie-Mellon University has published the OpenPose library and is constantly improving it. Besides recognizing body poses, the system can now estimate hand positions and facial expressions using 135 key points. We can even obtain 3D models using multiple views of the same scene. This technology will no doubt find its way to the field of motion-capture for film production.

GitHub, “Realtime Multi-Person Pose Estimation.”

GitHub, “OpenPose.”


Making medical diagnoses

In London, DeepMind researchers worked with doctors from the Moorfields Eye Hospital to create an artificial intelligence tool able to accurately diagnose some 50 eye diseases. This was accomplished by using optical coherence tomography to digitize 3D images of the interior of the eye. The system was “trained” using some 15,000 eye images from 7,500 patients, along with doctors’ diagnoses. Preliminary results show that the program can reach the same diagnosis as human doctors in 94% of cases. Researchers hope that the tool will soon be able to help doctors rapidly process scans to detect which patients need urgent care.

DeepMind, “A major milestone for the treatment of eye disease.”


Becoming a Breakout superhero

Before it beat Go champions with AlphaGo, DeepMind developed a program using Deep Reinforcement Learning to become a super-human champion of Breakout, the classic Atari 2,600 arcade game. The program was given no information on how the game worked; it was simply told that the goal was to get the highest score possible. It eventually figured out how to score points, and two hours later, it had reached the expert level using its self-learning capabilities to discover a winning strategy.

The New Yorker, “Artificial Intelligence Goes to the Arcade.”


Composing like Chopin

Francesco Marchesani of Torino Polytechnic University fed a recurrent neural network with MIDI files of Frederic Chopin’s pieces for piano. The results are by no means masterpieces, but they definitely sound like Chopin. Research in algorithmic composition has gone one step further, it is no longer a matter of copying a style in a statistical manner, but of producing truly innovative and original pieces.

Artists and Machine Intelligence,Neural Nets for Generating Music.”



A team of researchers from the Department of Computer Science at Oxford University has devised a lip-reading system that far surpasses the performance of human lip-reading experts and existing systems. The program does have its limits: speakers must face the camera squarely and their face must be brightly lit. But the team hopes to improve the tool to turn it into a useful aid for the deaf and hard of hearing.

University of Oxford, Department of Computer Science, “LipNet AI takes lip reading into the future.”


Colorizing photos and videos

Based on convolutional neural networks, this program developed by researchers at Tokyo’s Waseda University can realistically colorize images at any resolution, unlike current approaches that are based on ConvNets. It works just as well on modern B&W photos as on century-old images.

⇨ Satoshi Iizuka, Edgar Simo-Serra, Hiroshi Ishikawa, “Let there be Color! .”

See also:

⇨ Gustav Larsson, Michael Maire, Gregory Shakhnarovich, “Learning Representations for Automatic Colorization.”


Putting words in people’s mouths

Researchers at the University of Washington created a program that converts an audio file into mouth motions, then pastes these movements into existing videos. The final result is a video where speakers look like they’re saying something other than what they actually said. The example provided using Barack Obama is astonishing and even alarming. It’s a matter of time before “fake news” producers combine a voice-synthesizing tool with a video processing tool … with all of its devastating consequences. The researchers explained that they used clips of Obama because of the wide availability of high-quality videos of the former President, which eases the training of the neural network.

⇨ Supasorn Suwajanakorn, Steven M. Seitz, Ira Kemelmacher-Shlizerman, “Synthesizing Obama: Learning Lip Sync from Audio.”