Mona Lisa’s smile may be hard to read, but artificial intelligence researchers are revealing a new side to her enigmatic expression.
A video uploaded to YouTube last week by engineer Egor Zakharov shows the iconic portrait translated into three different video clips, each featuring Mona Lisa moving her mouth and turning her head as if in conversation — demonstrating that we can now produce realistic avatars using a single image. (Jarring examples created from photographs of Albert Einstein, Marilyn Monroe and Salvador Dali are also featured.)
Moscow-based Zakharov, an AI researcher with the Skolkovo Institute of Science and Technology and the Samsung AI Center, and his colleagues published their findings, which have not been peer-reviewed, in the journal arXiv.
Three-dimensional models of the human head are deeply complex, requiring “tens of millions of parameters,” the study authors say. Even with this advanced technology, passing living portraits off as the real deal isn’t likely as our eyes are very good at spotting “even minor mistakes” in digital portrayals of humans. Experts call this phenomenon the “uncanny valley effect,” denoting the unsettling feeling people experience when they see a digital human depiction that looks eerily close to reality.
The “living portraits,” also called “deepfakes,” were created by a type of AI called a convolutional neural network, which analyzes and processes images much like the human brain does. Traditionally, these digital duplicates were made with a technique called generative adversarial network (GAN), wherein an AI simply attempts to forge a lifelike image. But Samsung’s new system instead scans for facial “landmarks,” reducing the image to just a nose, mouth, eyes, eyebrows and a chin, to produce a sequence of entirely novel facial expressions that seem to give the once-immutable images a personality.
Researchers fed the AI a database of videos showing human faces in motion, including more than 7,000 celebrity references called VoxCeleb, and established key movements that would apply to any face. Then, they had the algorithm apply three different video clips of real humans to produce three unique takes which recreate how the image subject might move.
The more facial data fed to the AI, the more detailed the result. Study engineers found that living portraits created from 32 source images, as opposed to just one, could achieve “perfect realism.”
Some fear the new technology might bolster criminal impostors, though anyone interested in using Samsung’s technology will have to develop the software on their own. For now, study authors say their AI could have “practical applications for telepresence, including videoconferencing and multi-player games, as well as the special effects industry.”