NOTEBOOK

Dimensionality Reduction Notebook — PCA on MNIST & Feature Selection

Apply IncrementalPCA and low-rank PCA to 70k MNIST digits, reconstruct images, compare classification on raw vs reduced features, and run feature selection on Diabetes data.

Download Notebook (.ipynb)

Overview

This chapter applies PCA to MNIST for compression and visualisation, compares classification performance on raw vs PCA-reduced features, and explores feature selection on the Diabetes dataset.

You Will Learn

  • Using IncrementalPCA and low-rank PCA on large datasets
  • Visualising reconstruction quality as more components are added
  • Training MLPs on raw and PCA-transformed MNIST features
  • Computing correlation and chi-squared scores for feature selection

Main Content

Incremental PCA on MNIST

You apply IncrementalPCA to 70,000 MNIST images, processing data in batches to avoid memory issues. Inspecting explained variance ratios reveals that relatively few components suffice to capture most of the variability in handwritten digits, enabling substantial dimensionality reduction.

Reconstruction Experiments

By projecting images to m components and back, you qualitatively assess how much information each additional component provides. Reconstructions with 10 components are blurry but recognisable; with 100–150 components they become nearly indistinguishable from originals. This visually ties explained variance ratios to perceptual quality.

MLP Classification on Raw vs PCA Features

You train identical MLP architectures on raw 784-dimensional inputs and on PCA-reduced inputs of various dimensionalities. Measuring accuracy and training time demonstrates the trade-off: PCA typically preserves accuracy while reducing training time and risk of overfitting, especially when the classifier has many parameters in the first layer.

Feature Selection on the Diabetes Dataset

For the Diabetes dataset, you compute Pearson correlation between each feature and the target, as well as chi-squared scores for discretised features. Ranking features by these scores highlights the most predictive variables. Comparing models trained on the full feature set, on PCA components, and on selected features reveals how different dimensionality reduction strategies impact interpretability and performance.

Examples

IncrementalPCA Usage Sketch

Applying IncrementalPCA with batching on MNIST.

from sklearn.decomposition import IncrementalPCA

ipca = IncrementalPCA(n_components=100)
for X_batch in iterate_mnist_batches():
    ipca.partial_fit(X_batch)

X_reduced = ipca.transform(X_full)

Common Mistakes

Projecting validation/test data using PCA fitted on the full dataset

Why: Fitting PCA on all data leaks information from validation/test into the projection, compromising evaluation.

Fix: Fit PCA on the training set only, then apply the learned transform to validation and test sets.

Choosing number of components solely by variance explained without considering downstream task performance

Why: Some components capturing small variance may still be crucial for prediction.

Fix: Combine explained variance analysis with cross-validated performance of downstream models.

Mini Exercises

1. Train an MLP on MNIST using raw pixels, 50-component PCA, 100-component PCA and 200-component PCA. Compare accuracy and training time.

2. On the Diabetes dataset, compare feature subsets chosen by correlation and chi-squared score. Where do they agree and differ?

Further Reading