Ensemble Methods

Combining many weak models into one strong one — bagging reduces variance, boosting reduces bias, and random forests and stacking blend both ideas.

Ensembles combine diverse imperfect models so individual errors cancel out

Core idea

The ensemble is strongest when models make different errors. Five identical models are just one model repeated.

Voting handles classification; averaging handles regression or probabilities.

Bagging(random forests): train on random subsets in parallel — reduces variance.

Boosting(AdaBoost, gradient boosting): train sequentially, each fixing the last one’s errors — reduces bias.

Definition

An ensemble method combines predictions from multiple models ("weak learners") into one stronger prediction. The core insight: if individual models make different mistakes, averaging or voting cancels many of those mistakes out — the group is more reliable than any single member.

The two dominant families are bagging (train many models independently and average/vote) and boosting (train models sequentially, each one focusing on what the previous ones got wrong).

Why averaging helps

Imagine five decision trees, each 70% accurate but making different errors on different examples. If their errors are independent, majority vote can be substantially more than 70% accurate — wrong answers from different trees rarely line up on the same examples.

Try it

Would an ensemble of five identical models (always agreeing) be expected to outperform a single one of them?

Solution

No. If the models always make the same predictions, voting changes nothing — there's no diversity of errors to cancel out. Ensembling only helps when the individual models are reasonably accurate and make somewhat independent mistakes.

Related concepts

Machine Learning· Supervised Learning

Decision TreesFlowchart-like models that recursively partition the feature space by asking yes/no questions — interpretable but prone to overfitting.

Machine Learning· Model Training

Bias-Variance TradeoffThe fundamental tension between underfitting (high bias) and overfitting (high variance) — decomposing prediction error into its components.

Machine Learning· Model Training

Cross-ValidationEstimating model generalisation by repeatedly training on subsets and evaluating on the held-out remainder — k-fold, leave-one-out.

Machine Learning· Model Training

Model EvaluationConfusion matrices, accuracy, precision, recall, F1 score, ROC curves, and AUC — the toolkit for measuring classifier and regressor performance.