Naive Bayes
A probabilistic classifier that applies Bayes' theorem with the (often unrealistic) assumption that features are conditionally independent given the class.
Naive Bayes is a probabilistic classifier based on Bayes' theorem, with the "naive" assumption that features are conditionally independent given the class.
For class and features :
The predicted class is: .
Despite the "naive" independence assumption being almost always wrong in practice, Naive Bayes often works surprisingly well.
- A generative model โ it models how data is produced for each class, not just the decision boundary
- Trains extremely fast: just counts and averages, no iterative optimization needed
- Needs very little training data relative to more flexible classifiers
- Naturally handles many features and many classes without modification
- Zero-frequency problem: an unseen feature value for a class assigns it probability exactly 0, which then forces the entire posterior to 0 regardless of other evidence โ Laplace smoothing exists specifically to avoid this
- Trusting the predicted probabilities: because the independence assumption is usually false, NB's predicted probabilities tend to be overconfident (pushed toward 0 or 1) even when its classification decisions are correct
Features: "FREE" in subject (yes/no), exclamation marks (count).
, , .
For an email with "FREE": posterior (spam) vs (ham). Spam is much more likely.
Why is Naive Bayes called "naive"? Give an example where the independence assumption clearly fails.
Solution
It's naive because real features are almost never conditionally independent. Example: in text classification, the words "New" and "York" are strongly correlated โ if "New" appears, "York" is much more likely. But Naive Bayes treats their probabilities as if they're independent, given the class. This underestimates joint probabilities. Despite this flaw, Naive Bayes often works well because probabilities are only used to determine which class has the highest score, not for exact probability estimates.