5
Classification II — Trees, Bayes & Ensemble Methods
Summary
Beyond logistic regression: multinomial logistic regression with sklearn, Decision Trees that split on entropy, Gaussian Naive Bayes for fast probabilistic classification, and practical tools for the real world — confusion matrices, handling class imbalance with class_weight='balanced' and SMOTE, all applied to Iris and the large-scale Forest Covertype dataset.
Materials
THEORY
Decision Trees, Naive Bayes & the Class Imbalance Problem
How decision trees split on entropy, why Naive Bayes is 'naive' yet powerful, multinomial logistic regression, confusion matrices, and strategies for imbalanced datasets.
PRACTICE
Classifiers in Action — Iris, Forest Covertype & SMOTE
Train multinomial logistic regression, decision trees, and Gaussian Naive Bayes with sklearn; build confusion matrices; tackle class imbalance with balanced weights and SMOTE on a 580K-sample dataset.
Includes notebook