Deep Learning — CNNs, Transfer Learning & Contrastive Learning — Machine Learning

Overview

This chapter introduces convolutional neural networks (CNNs), transfer learning with pretrained models and supervised contrastive learning, with a focus on image tasks such as CIFAR-10.

You Will Learn

How CNNs process images with convolution and pooling
Residual connections in architectures like ResNet
What transfer learning is and when to use it
How supervised contrastive learning shapes representation spaces
The role of data augmentation and learning rate schedules in deep learning

Main Content

Convolutional Layers and Feature Maps

CNNs exploit the spatial structure of images. A convolutional layer applies learnable filters (kernels) across the height and width of an input, computing local dot products to produce feature maps. Weight sharing across spatial locations drastically reduces the number of parameters compared to fully-connected layers and encodes translation equivariance: shifting the input shifts the activations in the same way.

Pooling and Hierarchical Features

Pooling layers, such as max pooling, downsample feature maps by summarising local regions (e.g., 2×2 windows). This builds spatial invariance and reduces dimensionality, allowing deeper networks to capture increasingly abstract features. Early layers detect edges and textures; later ones respond to object parts and whole-object patterns.

Residual Connections and ResNet

Deep networks suffer from vanishing gradients and degradation as depth increases. Residual networks address this by introducing skip connections: instead of learning H(x) directly, a residual block learns F(x) such that H(x) = F(x) + x. This allows gradients to flow directly along identity paths and enables training of very deep networks.

Transfer Learning

Training large CNNs from scratch requires substantial data and compute. Transfer learning reuses convolutional layers from a network pretrained on a large dataset (e.g., ImageNet) and adapts them to a new task by replacing and training only the final classification layer or by fine-tuning a subset of layers with a smaller learning rate. This leverages learned low- and mid-level features that are broadly useful across visual tasks.

Supervised Contrastive Learning

Supervised contrastive learning trains an encoder to produce representations where samples of the same class are close and different classes are far apart. The SupCon loss treats all samples with the same label in a batch as positives and others as negatives, shaping the embedding space before training a simple classifier on top. This often leads to more robust and transferable representations than directly training with cross-entropy on logits.

Examples

Freezing Pretrained Layers in PyTorch

Load a pretrained ResNet and freeze all convolutional blocks.

import torchvision.models as models

resnet = models.resnet18(weights=models.ResNet18_Weights.DEFAULT)
for param in resnet.parameters():
    param.requires_grad = False

# Replace final FC layer for 10 classes
in_features = resnet.fc.in_features
resnet.fc = torch.nn.Linear(in_features, 10)

Common Mistakes

Training deep networks from scratch on tiny datasets

Why: Overfitting and poor generalisation are likely; the model capacity is mismatched to data size.

Fix: Use transfer learning, heavy augmentation, smaller models, or regularisation techniques to cope with limited data.

Using the same learning rate for all layers during fine-tuning

Why: Earlier layers contain generic features that should change slowly; later layers are more task-specific.

Fix: Use discriminative learning rates or freeze most layers and fine-tune only higher layers with a higher rate.

Mini Exercises

1. Explain why convolution is more parameter-efficient than a fully connected layer for images.

2. Describe a scenario where contrastive pretraining followed by a linear classifier might outperform direct supervised training.

Deep Learning — CNNs, Transfer Learning & Contrastive Learning