PhD dissertation. — Rheinisch-Westfälische Technische Hochschule, 2010. — 214 p.
Conventional speech recognition systems are based on Gaussian hidden Markov models (HMMs). Discriminative techniques such as log-linear modeling have been investigated in speech recognition only recently. This thesis establishes a log-linear modeling framework in the context of discriminative training criteria, with examples from continuous speech recognition, part-of-speech tagging, and handwriting recognition. The focus will be on the theoretical and experimental comparison of different training algorithms.
Equivalence relations for Gaussian and log-linear models in speech recognition are derived. It is shown how to incorporate a margin term into conventional discriminative training criteria like for example minimum phone error (MPE). This permits to evaluate directly the utility of the margin concept for string recognition. The equivalence relations and the margin-based training criteria lead to a unified view of three major training paradigms, namely Gaussian HMMs, log-linear models, and support vector machines (SVMs). Generalized iterative scaling (GIS) is traditionally used for the optimization of log-linear models with the maximum mutual information (MMI) criterion. This thesis suggests an extension of GIS to log-linear models including hidden variables, and to other training criteria (e.g. MPE). Finally, investigations on convex optimization in speech recognition are presented. Experimental results are provided for a variety of tasks, including the European Parliament plenary sessions task and Mandarin broadcasts.
Scientific Goals
A Transducer-Based Discriminative Framework
Equivalence Relations
Margin-Based Training
Growth Transformations
Convex Optimization
Scientific Contributions
Outlook