PyTorch I — Tensors, Matmul & Broadcasting — Machine Learning

Overview

PyTorch tensors are the building blocks of deep learning: multi-dimensional arrays that run on CPU or GPU and support automatic differentiation. This chapter covers tensor creation, dtype and device, shape semantics, broadcasting rules, matrix multiplication, reshape/view/transpose/permute, and autograd basics.

You Will Learn

Tensor creation: from lists, NumPy, zeros, ones, randn, arange
dtype and device (CPU vs CUDA)
Shape semantics and common bugs
Broadcasting rules with clear examples
Matrix multiplication (matmul) rules and examples
reshape, view, transpose, permute
autograd: requires_grad, backward()

Main Content

Tensors: Creation and Properties

Create tensors with torch.tensor(), torch.zeros(), torch.ones(), torch.randn(), torch.arange(). Every tensor has .shape, .dtype (float32, int64, etc.), and .device (cpu or cuda). Check these constantly — shape mismatches are the #1 source of bugs in deep learning.

Shape Semantics

Convention: (batch, features) for 2D, (batch, channels, height, width) for images. A design matrix X has shape (n_samples, n_features). A batch of images has (B, C, H, W). Linear layer expects input (batch, in_features) and weight (out_features, in_features); output is (batch, out_features).

Broadcasting

Dimensions are compared from right to left. They are compatible if equal or one is 1. (3, 4) and (4,) → (3, 4). (3, 1) and (1, 5) → (3, 5). Example: subtract a mean vector from a batch: x - x.mean(dim=0) broadcasts the mean across the batch.

Matrix Multiplication

torch.matmul(A, B) or A @ B. For 2D: (m, k) @ (k, n) → (m, n). For batches: (b, m, k) @ (b, k, n) → (b, m, n). Element-wise * is the Hadamard product — same shape required. A linear layer does y = x @ W.T + b.

Reshape, View, Transpose, Permute

view and reshape change shape without copying (if contiguous). squeeze() removes dims of size 1; unsqueeze(dim) adds one. transpose(dim0, dim1) swaps two dimensions. permute(dims) reorders all dimensions — e.g., (B,H,W,C) → (B,C,H,W) for conv layers.

Autograd Basics

Set requires_grad=True on tensors you want to differentiate. Operations build a computation graph. Call .backward() on a scalar loss to compute gradients. Access gradients via .grad. Use torch.no_grad() when you don't need gradients (e.g., validation).

Examples

Tensor Creation and Shape

Create tensors and inspect properties.

import torch
x = torch.randn(3, 4)
print(x.shape)   # (3, 4)
print(x.dtype)   # torch.float32
print(x.device)  # cpu

Broadcasting

Subtract per-feature mean from a batch.

import torch
X = torch.randn(32, 10)  # 32 samples, 10 features
mean = X.mean(dim=0)     # (10,)
X_centered = X - mean    # (32,10) - (10,) broadcasts to (32,10)

Matrix Multiplication

Linear transformation: (batch, in) @ (out, in).T

import torch
batch, in_f, out_f = 8, 64, 32
x = torch.randn(batch, in_f)
W = torch.randn(out_f, in_f)
y = x @ W.T  # (8, 64) @ (32, 64).T = (8, 32)

Autograd

Compute gradients for a simple loss.

import torch
x = torch.tensor([1.0, 2.0, 3.0], requires_grad=True)
y = x ** 2
loss = y.sum()
loss.backward()
print(x.grad)  # [2., 4., 6.]

Common Mistakes

Using * instead of @ for matrix multiplication

Why: * is element-wise; you get shape errors or wrong results.

Fix: Use torch.matmul or @ for matrix multiplication.

Broadcasting producing wrong shapes silently

Why: PyTorch broadcasts (1,) to match; you may get (3,4) when you wanted (4,3).

Fix: Check .shape after every operation; use unsqueeze explicitly when needed.

In-place operations breaking autograd

Why: x.add_(1) modifies x in place; the graph may not track it correctly.

Fix: Avoid in-place ops on tensors with requires_grad=True; use x = x + 1.

Mini Exercises

1. What is the output shape of torch.randn(5, 3) @ torch.randn(3, 7)?

2. Given x of shape (3, 4), write one line to add a batch dimension so it becomes (1, 3, 4).

3. Why does loss.backward() require loss to be a scalar?

PyTorch I — Tensors, Matmul & Broadcasting

Overview

You Will Learn

Main Content

Tensors: Creation and Properties

Shape Semantics

Broadcasting

Matrix Multiplication

Reshape, View, Transpose, Permute

Autograd Basics

Examples

Tensor Creation and Shape

Broadcasting

Matrix Multiplication

Autograd

Common Mistakes

Mini Exercises

Further Reading

Related Topics