Linear Regression
Fitting a straight line to data to model and predict the relationship between two variables.
Linear regression fits a straight line through a scatter of data points to model the relationship between two variables.
Given data on and , we find the line that best summarises the pattern — "best" meaning the line minimises the total squared distance between the actual values and the line's predictions.
- is the slope: how much changes for each unit increase in
- is the intercept: the predicted value of when
Data from 8 students:
| Hours studied | Exam score |
|---|---|
| 1 | 45 |
| 2 | 52 |
| 3 | 58 |
| 4 | 64 |
| 5 | 70 |
| 6 | 75 |
| 7 | 79 |
| 8 | 83 |
The regression line turns out to be approximately .
Prediction: a student who studies 5 hours is predicted to score points.
Using the regression equation , predict the score for a student who studies 10 hours. Should you trust this prediction? Why or why not?
Solution
. The prediction is 97 points.
Be cautious: this is extrapolation — predicting outside the range of the data (1–8 hours). The relationship may not continue to be linear beyond 8 hours. Also, a score above 100 may not be possible if the test is capped. Regression predictions are most reliable within the range of observed data.