Simple linear regression
Imagine you’re a restaurant owner. You notice that on warmer days, more people buy ice cream. If you could quantify that relationship, you could predict sales based on tomorrow’s weather forecast. That’s exactly what simple linear regression does. It’s one of the most fundamental tools in statistics and machine learning. And despite its name, it’s genuinely simple. What Is Simple Linear Regression? At its core, simple linear regression models the relationship between two continuous variables: The method finds the best straight line that describes how Y changes when X changes. Think back to high school algebra: y = mx + b. Linear regression is the same idea, just with fancier terminology and statistical rigor. The Formula (Don’t Worry, It’s Painless) The population model looks like this: Here’s what it means in plain English: Symbol Meaning Plain Translation Y Dependent variable What you’re predicting X Independent variable What you’re using to predict β₀ Intercept Value of Y when X equals zero β₁ Slope How much Y changes when X increases by 1 unit ε Error term Stuff your model can’t explain The fitted model (what you actually use) is simply: Where Ŷ (pronounced “Y-hat”) is your prediction. A Concrete Example Let’s say you want to predict exam scores based on hours studied. Hours Studied (X) Actual Score (Y) 1 55 2 65 3 70 4 80 After running the regression, you get this line: How to interpret this: So if a student studies 5 hours: 45 + 8.5(5) = 87.5 predicted score. Pretty useful, right? How Does It Find the “Best” Line? The method used is called Ordinary Least Squares (OLS) – a name that sounds complicated but isn’t. OLS finds the line that minimizes the sum of squared residuals. What’s a residual? The difference between your actual Y value and your predicted Ŷ value. Imagine drawing a line through your data points. Some points are above the line, some below. The residuals are those vertical distances. OLS squares them all (so negatives don’t cancel positives) and adds them up. The line with the smallest total wins. That’s it. That’s the magic. The Four Assumptions You Should Know Linear regression works well when certain conditions are met. Think of these as the rules of the road: 1. Linearity The relationship between X and Y must be linear. If your data looks like a U-shape or an S-curve, a straight line won’t cut it. 2. Independence Each observation should be independent of the others. This fails with time series data (today’s stock price depends on yesterday’s) or clustered data (students in the same classroom). 3. Homoscedasticity (say that three times fast) The spread of residuals should be roughly constant across all X values. If predictions are wildly inaccurate for high X values but spot-on for low X values, you have a problem. 4. Normality (mostly for inference) The errors should be roughly normally distributed. This matters primarily if you’re calculating confidence intervals or p-values. Quick check: Plot your residuals. If they look random with no obvious patterns, you’re probably fine. How Good Is Your Model? You’ve run the regression. Now what? Here are the key metrics to evaluate your model: R-squared (R²) This tells you what proportion of the variance in Y is explained by X. Ranges from 0 to 1. Higher is better, but beware: adding any variable increases R², even useless ones. Residual Standard Error (RSE) This is the typical size of your prediction errors, measured in the same units as Y. If RSE = 5 points and you’re predicting exam scores, your predictions are typically off by about ±5 points. P-value for the Slope This tests whether the slope is significantly different from zero. When Should You Actually Use It? Simple linear regression shines in these scenarios: Use it when you have one clear predictor, a roughly linear relationship, and you need interpretability over raw predictive power. Quick Python Implementation Want to try this yourself? Here’s a minimal example using statsmodels: The output gives you coefficients, R-squared, p-values, and diagnostic information – everything you need to interpret your model.
Simple linear regression Read More »

