The Decade of Machines, that Understand Speech

22.08.2003, 18:00-18:45

Schrödinger-Saal

Plenary /

German and English language

About
Biographies

Alex Acero

THE DECADE OF MACHINES THAT UNDERSTAND SPEECH

Speech recognition has been an active area of research for the last 40 years that, while it s starting to be used in some commercial applications, is far from the Star Trek computer we all want. Many of the predictions in science fiction movies like 2001 Space Odyssey have been correct but not the prediction of the intelligent computer that talks. In this talk I will give a brief historical overview and then describe some of the challenges this technology faces. Demos that illustrate the state-of-the-art will be provided. Finally I ll describe opportunities for speech technology during this decade.

Artificial Intelligence is the set of disciplines that tackle problems that humans find easy to solve but machines find it hard. Speech recognition is the holy grail of artificial intelligence. Many users are perplexed that computers can beat a chess grandmaster yet cannot do something as simple as recognize speech reliably. This mismatch in expectations has caused many problems in the field.

Humans do much better than machines in recognizing speech because they don t simply transcribe the words but they also understand what the message is, and thus can guess what a missing word (perhaps due to background noise or lack of clarity on the speaker) is through the use of context. Understanding and transcription often come hand in hand for humans, yet this is not the case with computers that have a very limited understanding capability. Much of the work that will happen this decade to break this status-quo will have to do with improving this context model by adding domain-independent knowledge as well as personalization.

The cocktail party effect shows the ability of humans to follow one conversation when several are present simultaneously. This is currently not possible with today s speech recognition technology. Scene analysis should take the incoming signal and interpreted as a sum of two signals that has the highest likelihood and a more powerful spectral analysis is needed for this to happen. In addition, the context model will be needed in breaking up the signal into two or more independent signals. This poses tremendous computational and algorithmic challenges that will need to be resolved before we can successfully talk to our smartphones in the train station or cafeteria.

Speech recognition works reasonably well when a speaker trains a system and articulates his/her speech. The error rate of recognition systems increases to the point of making them useless when the user speaks in a more spontaneous manner. A new paradigm is needed to better model this spontaneous style.

Alejandro ACERO

Senior Researcher and Manager Speech Technology Group

Peter F. KROGH

Dean emeritus and distinguished Professor of International Affairs, Georgetown University, Washington, D.C.

Chair

Alejandro ACERO

Senior Researcher and Manager Speech Technology Group

	Before joining Microsoft in 1994, I worked in the speech groups of Apple Computer and Telefonica Investigacion y Desarrollo. I received a Ph.D. from Carnegie Mellon University in 1990, a Master's from Rice University in 1987 and an engineering degree from the Universidad Politecnica de Madrid in 1985, all in Electrical Engineering. I'm also an affiliate Professor of Electrical Engineering at University of Washington.
	Research interests:
	Speech Recognition: robustness to noise, rapid adaptation, acoustic modeling, signal processing.
	Spoken Language Systems: rapid prototyping of speech understanding systems.
	Speech Synthesis: automatically trained concatenative synthesis and distribution-based synthesis.

Dr. Peter F. KROGH

Dean emeritus and distinguished Professor of International Affairs, Georgetown University, Washington, D.C.

	Studied Arts in Law and Diplomacy and Philosophy at Tufts University
1958-1960	Trainee and Acting Assistant Branch Manager, The New England Merchants Bank, Boston
1961-1962	Instructor in Government, Tufts University
1962-1967	Assistant Dean, Fletcher School of Law and Diplomacy, Tufts University
1963-1967	Host, television interview program, "Backgrounds" - WGBH-TV, Boston
1965	Visiting Scholar, The Brookings Institute
1967-1968	White House Fellow, Special Assistant to the Secretary of State
1968-1970	Associate Dean, Fletcher School of Law and Diplomacy, Tufts University
1970-1995	Dean and Professor of International Affairs, School of Foreign Service
1982-1988	Moderator, weekly PBS television program on foreign affairs "American Interests"
1988-2005	Moderator, PBS television foreign affairs series: "Great Decisions"
since 1995	Dean Emeritus and Distinguished Professor of International Affairs, Georgetown University, Washington, D.C.

11:00 - 12:15	Opening	Plenary
11:15 - 12:00	Time of change – chage as chance	Plenary
12:00 - 12:45	50 Years of Schrödingers Reflections on Life and Living	Plenary
13:00 - 14:15	Location Strategies for Know-how Intensive Industries	Plenary
14:15 - 15:15	Medical Technology and Preventive Medicine	Plenary
15:15 - 16:30	The Future of European Reseach – New Instruments and Resources	Plenary
18:00 - 18:45	The Living Clock	Plenary
18:45 - 19:30	The Devices of Wonder – the Science of Devices of Wonder	Plenary

07:00 - 15:00	Working Group 1: Risk	Breakout
07:00 - 15:00	Working Group 2: R&D Infrastructure – a Location Strategy for Metropolitan Areas	Breakout
07:00 - 15:00	Working Group 3: Utilities and Infrastructure – the Backbone for an Industrialised Country	Breakout
07:00 - 15:00	Working Group 4: Kyoto and CO2 – Technology Pull and/or Location Push	Breakout
07:00 - 15:00	Working Group 5: Innovation Motor Micro- and Nanotechnologies	Breakout
07:00 - 15:00	Working Group 6: Brain gain, brain drain – Future networks Austria – USA	Breakout
07:00 - 15:00	Working Group 7: New mobility – new partnerships for the western Balkan countries	Breakout
07:00 - 15:00	Working Group 8: Medical Technology and Preventive Medicine -Finance and Organisation	Breakout
07:00 - 15:00	Working Group 9: Digitalisation of communication – “Your Personal Radio and TV-program”	Breakout
07:00 - 12:00	Off Alpbach	Plenary
18:00 - 18:45	The Decade of Machines, that Understand Speech	Plenary
18:45 - 19:30	Technology and Know-how Management in and for Intelligence Services	Plenary

07:00 - 08:00	The Location of Science	Plenary
08:00 - 09:00	Reflections on the Alpbach Technology-Symposium – Presenation of “Junior Alpbach”	Plenary
09:30 - 10:15	Cosmic Background Radiation	Plenary
10:15 - 11:00	Architecture für Science – the New Architecture of Science	Plenary

The Decade of Machines, that Understand Speech

Alejandro ACERO

Senior Researcher and Manager Speech Technology Group

Dr. Peter F. KROGH

Dean emeritus and distinguished Professor of International Affairs, Georgetown University, Washington, D.C.

Technology Symposium

Plenary

Opening

Plenary

Time of change – chage as chance

Plenary

50 Years of Schrödingers Reflections on Life and Living

Plenary

Location Strategies for Know-how Intensive Industries

Plenary

Medical Technology and Preventive Medicine

Plenary

The Future of European Reseach – New Instruments and Resources

Plenary

The Living Clock

Plenary

The Devices of Wonder – the Science of Devices of Wonder

Breakout

1: Risk

Breakout

2: R&D Infrastructure – a Location Strategy for Metropolitan Areas

Breakout

3: Utilities and Infrastructure – the Backbone for an Industrialised Country

Breakout

4: Kyoto and CO2 – Technology Pull and/or Location Push

Breakout

5: Innovation Motor Micro- and Nanotechnologies

Breakout

6: Brain gain, brain drain – Future networks Austria – USA

Breakout

7: New mobility – new partnerships for the western Balkan countries

Breakout

8: Medical Technology and Preventive Medicine -Finance and Organisation

Breakout

9: Digitalisation of communication – “Your Personal Radio and TV-program”

Plenary

Off Alpbach

Plenary