Support Vector Machines

Finding the classification boundary with the widest possible margin to the nearest points of each class — and the kernel trick that lets it draw curved boundaries.

SVM chooses the separating line with the widest margin

Margin rule

The dashed lines touch the nearest points. Those touching points are the support vectors.

Wider margin means the classifier is less fragile to small movement in the data.

Definition

A Support Vector Machine (SVM) classifies data by finding the boundary (a line in 2D, a hyperplane in general) that separates two classes with the widest possible margin — the largest distance to the nearest point of either class. Unlike logistic regression, which finds any separating line that fits well, SVM specifically seeks the one that maximizes the buffer zone around it.

The points closest to the boundary — the ones that "support" where the margin sits — are called support vectors. Remarkably, only these points determine the boundary; every other point could move around (without crossing the margin) with no effect on the result.

Why maximize the margin?

Imagine two clusters of points with many possible separating lines. A line that barely squeezes between the closest points of each class is fragile — a new point close to the boundary could easily land on the wrong side. The maximum-margin line is the most "confident" choice: it stays as far as possible from both classes.

Try it

If you removed a point that's not a support vector and far from the boundary, would the SVM's decision boundary change?

Solution

No. The boundary is determined entirely by the support vectors (the closest points). A point far from the margin contributes nothing to where the maximum-margin boundary sits, so removing it leaves the boundary unchanged.

Related concepts

Machine Learning· Supervised Learning

Logistic RegressionModelling the probability of a binary outcome using the sigmoid function — fitting by maximum likelihood or gradient descent.

Machine Learning· Supervised Learning

Decision TreesFlowchart-like models that recursively partition the feature space by asking yes/no questions — interpretable but prone to overfitting.

Machine Learning· Supervised Learning

K-Nearest NeighborsClassifying or predicting by looking at the k closest training points — the simplest non-parametric method, intuitive yet powerful.

Machine Learning· Model Training

RegularizationAdding a penalty on model complexity to prevent overfitting — L1 (Lasso) induces sparsity, L2 (Ridge) shrinks coefficients smoothly.