9

Dimensionality Reduction — PCA, MNIST & Feature Selection

Summary

Tackling high-dimensional data with Principal Component Analysis: IncrementalPCA on the full MNIST dataset (70,000 digit images with 784 pixels each), explained variance analysis to choose how many components to keep, low-rank PCA via torch.pca_lowrank() for efficient computation, reconstructing digit images from principal components, and comparing MLP classification accuracy on raw pixels versus PCA-reduced features. Also covers feature selection using correlation coefficients and chi-squared tests on the Diabetes dataset.