Home > Personal Technology > Google > Google’s AI Can Now Lip Read Better than Humans

Google’s AI Can Now Lip Read Better than Humans

After watching thousands of hours of television, the AI annotated with 46.8% accuracy.


Engineers at Google’s Deep Mind AI division have worked together with researchers from Oxford University to create the most accurate lip-reading software to date. By viewing thousands of hours of footage from the BBC, the scientists were able to create a neural net in the AI which manages to lip read with a 46.8% accuracy. By comparison, a professional human lip-reader who viewed the same footage, was only able to guess the right word 12.4% of the time.

The software builds on work published earlier this month by another research group at Oxford. Using similar techniques, these scientists created a piece of software called LipNet, which was able to read lips with a 93.4% accuracy in some tests, compared to a human’s 52.3% accuracy. It should be noted that these tests were conducted using specially recorded video of people speaking clear formulaic sentences to help recognition. Deep Mind’s software, “Watch, Listen, Attend and Spell” meanwhile, was tested on natural, unscripted conversations from political shows.


More than 5,000 hours of footage were used to train Google’s software, and included video from the television shows Newsnight, Question Time, and The World Today. The video footage included 118,000 sentences with some 17,500 unique words, compared to LipNet’s 51 word vocabulary.

Google suggests that the software can be used for a variety of applications; from annotating silent films to helping the hearing impaired. It could also be used to control digital assistants like Cortana and Siri by simply mouthing words (handy for when you’re in public). Of course, lip reading could also be used for surveillance, and the technology hearkens back to Stanley Kubrick’s 2001: A Space Odyssey, where a rogue AI named HAL9000 lip-reads astronauts in deep space and begins plotting against them. Researchers claim there’s a big difference between lip-reading a clear picture on tv and a grainy security feed, but as technology improves, that difference is bound to shrink.

source: Oxford University


David F.
A grad student in experimental physics, David is fascinated by science, space and technology. When not buried in lecture books, he enjoys movies, gaming and mountainbiking

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Read previous post:
Ubisoft’s 30 days of giveaways leaked: Free Assassin’s Creed 3, The Crew, and more

Ubisoft is celebrating its 30th anniversary by doling out even more free games to its fans. After giving away several...