Thursday, March 13, 4:15pm, 9206
 
Michael Chan  
(GE Global Research)
 
"Assisting Speech Recognition with Machine Lipreading"
 
Traditional automatic speech recognition (ASR) systems perform best
when the target environment matches those in which the training data
were collected. In practice, it is not always achievable. Often,
easily confusable phonemes (e.g., /m/ and /n/) become more confusable
under noisy conditions, and consequently recognition performance is
degraded. We have developed a video-based lip-reading system that is
capable of tracking and extracting visual parameters from the mouth
area of the speaker. By combining visual features with traditional
acoustic features (e.g., mel-cepstral coefficients), we demonstrate
that substantial improvement in recognition accuracy is achievable
at low SNR levels. We present color-based segmentation and
contour-based tracking algorithms we developed to support the extraction
of visual features at 30 frames per second. We compare the effectiveness
of geome tric and appearance-based features, as well as combination
thereof, for speech recognition. Finally, we show that multi-stream
Hidden Markov Models (HMMs) are superior to single-stream HMM for
constructing multi-modal ASR systems especially for application in
noisy environments.
 
The Colloquium is supported by generous
contributions from the CUNY Faculty Development Program, Bloomberg,
Information Builders, Inc., and Royal Philips Electronics.
 
 
|
|
|