Doctoral Program in Computer Science
365 5th Avenue
New York City 10016
Room 4319
Phone: 212.817.8190
Fax: 212.817.1510
compsci@gc.cuny.edu
  Click here to go to the Graduate Center main page.

Computer Science Colloquium
 


Thursday, March 13, 4:15pm, 9206
 
Michael Chan  
(GE Global Research)
 
"Assisting Speech Recognition with Machine Lipreading"
 
Traditional automatic speech recognition (ASR) systems perform best when the target environment matches those in which the training data were collected. In practice, it is not always achievable. Often, easily confusable phonemes (e.g., /m/ and /n/) become more confusable under noisy conditions, and consequently recognition performance is degraded. We have developed a video-based lip-reading system that is capable of tracking and extracting visual parameters from the mouth area of the speaker. By combining visual features with traditional acoustic features (e.g., mel-cepstral coefficients), we demonstrate that substantial improvement in recognition accuracy is achievable at low SNR levels. We present color-based segmentation and contour-based tracking algorithms we developed to support the extraction of visual features at 30 frames per second. We compare the effectiveness of geome tric and appearance-based features, as well as combination thereof, for speech recognition. Finally, we show that multi-stream Hidden Markov Models (HMMs) are superior to single-stream HMM for constructing multi-modal ASR systems especially for application in noisy environments.

 
The Colloquium is supported by generous contributions from the CUNY Faculty Development Program, Bloomberg, Information Builders, Inc., and Royal Philips Electronics.
 

 

Computer Science Colloquium Start page

Next Talk

Schedule

Past events

Pictures