After watching thousands of hours of television, the AI annotated with 46.8% accuracy.
Engineers at Google’s Deep Mind AI division have worked together with researchers from Oxford University to create the most accurate lip-reading software to date. By viewing thousands of hours of footage from the BBC, the scientists were able to create a neural net in the AI which manages to lip read with a 46.8% accuracy. By comparison, a professional human lip-reader who viewed the same footage, was only able to guess the right word 12.4% of the time.
The software builds on work published earlier this month by another research group at Oxford. Using similar techniques, these scientists created a piece of software called LipNet, which was able to read lips with a 93.4% accuracy in some tests, compared to a human’s 52.3% accuracy. It should be noted that these tests were conducted using specially recorded video of people speaking clear formulaic sentences to help recognition. Deep Mind’s software, “Watch, Listen, Attend and Spell” meanwhile, was tested on natural, unscripted conversations from political shows.
More than 5,000 hours of footage were used to train Google’s software, and included video from the television shows Newsnight, Question Time, and The World Today. The video footage included 118,000 sentences with some 17,500 unique words, compared to LipNet’s 51 word vocabulary.
Google suggests that the software can be used for a variety of applications; from annotating silent films to helping the hearing impaired. It could also be used to control digital assistants like Cortana and Siri by simply mouthing words (handy for when you’re in public). Of course, lip reading could also be used for surveillance, and the technology hearkens back to Stanley Kubrick’s 2001: A Space Odyssey, where a rogue AI named HAL9000 lip-reads astronauts in deep space and begins plotting against them. Researchers claim there’s a big difference between lip-reading a clear picture on tv and a grainy security feed, but as technology improves, that difference is bound to shrink.
source: Oxford University