The Decade of Machines, that Understand Speech
Alex Acero
THE DECADE OF MACHINES THAT UNDERSTAND SPEECH
Speech recognition has been an active area of research for the last 40 years that, while it s starting to be used in some commercial applications, is far from the Star Trek computer we all want. Many of the predictions in science fiction movies like 2001 Space Odyssey have been correct but not the prediction of the intelligent computer that talks. In this talk I will give a brief historical overview and then describe some of the challenges this technology faces. Demos that illustrate the state-of-the-art will be provided. Finally I ll describe opportunities for speech technology during this decade.
Artificial Intelligence is the set of disciplines that tackle problems that humans find easy to solve but machines find it hard. Speech recognition is the holy grail of artificial intelligence. Many users are perplexed that computers can beat a chess grandmaster yet cannot do something as simple as recognize speech reliably. This mismatch in expectations has caused many problems in the field.
Humans do much better than machines in recognizing speech because they don t simply transcribe the words but they also understand what the message is, and thus can guess what a missing word (perhaps due to background noise or lack of clarity on the speaker) is through the use of context. Understanding and transcription often come hand in hand for humans, yet this is not the case with computers that have a very limited understanding capability. Much of the work that will happen this decade to break this status-quo will have to do with improving this context model by adding domain-independent knowledge as well as personalization.
The cocktail party effect shows the ability of humans to follow one conversation when several are present simultaneously. This is currently not possible with today s speech recognition technology. Scene analysis should take the incoming signal and interpreted as a sum of two signals that has the highest likelihood and a more powerful spectral analysis is needed for this to happen. In addition, the context model will be needed in breaking up the signal into two or more independent signals. This poses tremendous computational and algorithmic challenges that will need to be resolved before we can successfully talk to our smartphones in the train station or cafeteria.
Speech recognition works reasonably well when a speaker trains a system and articulates his/her speech. The error rate of recognition systems increases to the point of making them useless when the user speaks in a more spontaneous manner. A new paradigm is needed to better model this spontaneous style.
| |||||
|
Alejandro ACERO
Senior Researcher and Manager Speech Technology Group
Before joining Microsoft in 1994, I worked in the speech groups of Apple Computer and Telefonica Investigacion y Desarrollo. I received a Ph.D. from Carnegie Mellon University in 1990, a Master's from Rice University in 1987 and an engineering degree from the Universidad Politecnica de Madrid in 1985, all in Electrical Engineering. I'm also an affiliate Professor of Electrical Engineering at University of Washington. | |
Research interests: | |
Speech Recognition: robustness to noise, rapid adaptation, acoustic modeling, signal processing. | |
Spoken Language Systems: rapid prototyping of speech understanding systems. | |
Speech Synthesis: automatically trained concatenative synthesis and distribution-based synthesis. |
Dr. Peter F. KROGH
Dean emeritus and distinguished Professor of International Affairs, Georgetown University, Washington, D.C.
Studied Arts in Law and Diplomacy and Philosophy at Tufts University | |
1958-1960 | Trainee and Acting Assistant Branch Manager, The New England Merchants Bank, Boston |
1961-1962 | Instructor in Government, Tufts University |
1962-1967 | Assistant Dean, Fletcher School of Law and Diplomacy, Tufts University |
1963-1967 | Host, television interview program, "Backgrounds" - WGBH-TV, Boston |
1965 | Visiting Scholar, The Brookings Institute |
1967-1968 | White House Fellow, Special Assistant to the Secretary of State |
1968-1970 | Associate Dean, Fletcher School of Law and Diplomacy, Tufts University |
1970-1995 | Dean and Professor of International Affairs, School of Foreign Service |
1982-1988 | Moderator, weekly PBS television program on foreign affairs "American Interests" |
1988-2005 | Moderator, PBS television foreign affairs series: "Great Decisions" |
since 1995 | Dean Emeritus and Distinguished Professor of International Affairs, Georgetown University, Washington, D.C. |