Overview
This chapter implements EM for Gaussian mixtures on the Peterson & Barney vowel dataset, builds a Maximum Likelihood classifier and deals with numerical issues such as singular covariances via regularisation.
You Will Learn
- Implementing EM in code for full-covariance GMMs
- Training class-conditional GMMs and using them for classification
- Visualising decision boundaries of density-based classifiers
- Detecting and fixing covariance singularities with diagonal jitter
Main Content
EM Implementation for 2D GMMs
You implement EM for 2D GMMs, carefully managing matrix operations for covariance updates and log-likelihood computation. Monitoring log-likelihood across iterations provides a sanity check that each EM step improves or maintains the objective.
Vowel Classification via GMMs
By fitting separate GMMs to different vowel classes in F1–F2 space, you construct a density-based classifier. Visualising the resulting decision regions over a meshgrid shows curved, overlapping boundaries that correspond to the natural acoustical variability of speech, contrasting with the straight boundaries of linear classifiers.
Handling Singularities
When features are redundant or when too many components are used, EM can produce covariance matrices with extremely small eigenvalues. You detect this via determinants or eigenvalue checks and stabilise training by adding εI to each covariance after updates. This regularisation step is critical in high-dimensional settings and mirrors practices in deep learning (e.g., adding ε in batch normalisation).
Examples
Covariance Regularisation Snippet
Adding diagonal jitter to covariance matrices.
eps = 1e-6
for k in range(K):
# Sigma_k is a 2x2 or dxd covariance
eigvals, eigvecs = np.linalg.eigh(Sigma_k)
eigvals_clamped = np.clip(eigvals, eps, None)
Sigma_k = eigvecs @ np.diag(eigvals_clamped) @ eigvecs.TCommon Mistakes
Using naïve Gaussian PDFs that underflow in high dimensions
Why: Multiplying many small densities can underflow to zero in floating point.
Fix: Work in log-space when computing responsibilities: use log-sum-exp tricks rather than directly multiplying densities.
Stopping EM based only on parameter changes without monitoring log-likelihood
Why: Parameters may change little but the log-likelihood may still improve (or vice versa).
Fix: Track both and define a stopping criterion based on relative log-likelihood improvements and a maximum number of iterations.
Mini Exercises
1. Run EM with different numbers of components K on the vowel data (e.g., 1, 2, 3, 6) and compare classification accuracy and decision boundaries.