MIT Press, 2006. — 501 p. — ISBN: 978-0-262-03358-9.
During the last years, semi-supervised learning has emerged as an exciting new direction in machine learning reseach. It is closely related to profound issues of how to do inference from data, as witnessed by its overlap with transductive inference (the distinctions are yet to be made precise).
At the same time, dealing with the situation where relatively few labeled training points are available, but a large number of unlabeled points are given, it is directly relevant to a multitude of practical problems where is it relatively expensive to produce labeled data, e.g., the automatic classification of web pages. As a field, semi-supervised learning uses a diverse set of tools and illustrates, on a small scale, the sophisticated machinery developed in various branches of machine learning such as kernel methods or Bayesian techniques.
As we work on semi-supervised learning, we have been aware of the lack of an authoritative overview of the existing approaches. In a perfect world, such an overview should help both the practitioner and the researcher who wants to enter this area. A well researched monograph could ideally fill such a gap; however, the field of semi-supervised learning is arguably not yet sufficiently mature for this. Rather than writing a book which would come out in three years, we thus decided instead to provide an up-to-date edited volume, where we invited contributions by many of the leading proponents of the field. To make it more than a mere collection of articles, we have attempted to ensure that the chapters form a coherent whole and use consistent notation. Moreover, we have written a short introduction, a dialogue illustrating some of the ongoing debates in the underlying philosophy of the field, and we have organized and summarized a comprehensive benchmark of semi-supervised learning.
Benchmarks are helpful for the practitioner to decide which algorithm should be chosen for a given application. At the same time, they are useful for researchers to choose issues to study and further develop. By evaluating and comparing the performance of many of the presented methods on a set of eight benchmark problems, this book aims at providing guidance in this respect. The problems are designed to reflect and probe the different assumptions that the algorithms build on. All data sets can be downloaded from the book web page, which can be found at http://www.kyb.tuebingen.mpg.de/ssl-book/.
Introduction to Semi-Supervised Learning
I Generative ModelsA Taxonomy for Semi-Supervised Learning Methods
Semi-Supervised Text Classification Using EM
Risks of Semi-Supervised Learning
Probabilistic Semi-Supervised Clustering with Constraints
II Low-Density SeparationTransductive Support Vector Machines
Semi-Supervised Learning Using Semi-Definite Programming
Gaussian Processes and the Null-Category Noise Model
Entropy Regularization
Data-Dependent Regularization
III Graph-Based MethodsLabel Propagation and Quadratic Criterion
The Geometric Basis of Semi-Supervised Learning
Discrete Regularization
Semi-Supervised Learning with Conditional Harmonic Mixing
IV Change of RepresentationGraph Kernels by Spectral Transforms
Spectral Methods for Dimensionality Reduction
Modifying Distances
V Semi-Supervised Learning in PracticeLarge-Scale Algorithms
Semi-Supervised Protein Classification Using Cluster Kernels
Prediction of Protein Function from Networks
Analysis of Benchmarks
VI PerspectivesAn Augmented PAC Model for Semi-Supervised Learning
Metric-Based Approaches for Semi-Supervised Regression and Classification
Transductive Inference and Semi-Supervised Learning
A Discussion of Semi-Supervised Learning and Transduction