Voice recognition algorithms have advanced steadily in the past few years. And with a new Dragon coming out each year, the super-responsive and virtual friends Siri, and now with Google Voice Search, voice recognition is not going away anytime soon. But, even though we may be successful getting the computer to recognize our speech and transcribing it into words, how do we make the computer truly “understand” us?
The way voice recognition works currently is based on one or both of the following modes. The acoustic model uses statistics to turn sounds into words. Once the first part of a word is spoken, there are only a finite number of subsequent sound combinations that would lead to a recognized word. The language model attempts to match words that usually go together.
Though I can train my computer to convert my speech to text, other scenarios may pose additional problems, such as background noise, different accents and poor sound quality. Further, when I speak to Dragon or to Siri, or any of a number of my other virtual friends, I have to be deliberate and clear in my speech. We may think we are training the computers, but in reality, the computers are training us, because natural speech does not work that way. Natural, conversational speech is, more often than not, too fast and unclear for computers to understand.
(Disney and Pixar's WALL-E captivated young and old alike with the emotional tale of robots that can 'love'.)
Peter Robinson, Professor of Computer Technology at the University of Cambridge Computer Laboratory, wants to change all that. He wants computers to not only understand what we are saying, but how we are saying it. Along with just the words themselves, come a wide variety of cues as to the emotions of the speaker. For instance, facial expressions, tone of voice, head movement and body movement all provide subtext to the meaning of the words. To account for this, Robinson has programmed into his computer 400 predefined mental states, which are triggered by reading these expressions and movements in the speaker. “The computer can correctly read my mind more than 70% of the time, and that’s as well as most people can understand me,” he remarks.
Robinson is not satisfied however with just having the computer read his emotions. Rather, he wants a computer that can respond emotionally as well. Of course, not all of us want such an emotional collection of bytes. I, for one, want my computer to do what I tell it to do. I don’t need a computer to do what it feels like doing. I’ve got children for that.