Linear Regression from Scratch with NumPy
Build linear regression from the ground up using NumPy, understand the math behind gradient descent, and implement it without any ML library.
Table of Contents
Linear Regression from Scratch with NumPy
Understanding linear regression deeply means implementing it yourself. No scikit-learn, no frameworks — just NumPy and math.
The Hypothesis
Linear regression predicts a continuous output $y$ from inputs $\mathbf{x}$:
$$h_\theta(\mathbf{x}) = \theta_0 + \theta_1 x_1 + \theta_2 x_2 + \ldots = \mathbf{x}^T \boldsymbol{\theta}$$
In vectorized form (where $x_0 = 1$ for the bias term):
$$\hat{\mathbf{y}} = X \boldsymbol{\theta}$$
The Cost Function
Mean Squared Error (MSE) measures how wrong our predictions are:
$$J(\boldsymbol{\theta}) = \frac{1}{2m} \sum_{i=1}^{m} (h_\theta(\mathbf{x}^{(i)}) - y^{(i)})^2 = \frac{1}{2m} |X\boldsymbol{\theta} - \mathbf{y}|^2$$
Gradient Descent
To minimize $J$, update parameters in the direction of the negative gradient:
$$\boldsymbol{\theta} := \boldsymbol{\theta} - \alpha \nabla_\theta J = \boldsymbol{\theta} - \frac{\alpha}{m} X^T (X\boldsymbol{\theta} - \mathbf{y})$$
Implementation
import numpy as np
import matplotlib.pyplot as plt
from typing import Tuple
class LinearRegression:
def __init__(self, learning_rate: float = 0.01, epochs: int = 1000):
self.lr = learning_rate
self.epochs = epochs
self.theta: np.ndarray | None = None
self.loss_history: list[float] = []
def fit(self, X: np.ndarray, y: np.ndarray) -> 'LinearRegression':
m, n = X.shape
# Add bias column (x_0 = 1)
X_b = np.c_[np.ones((m, 1)), X]
# Initialize parameters to zero
self.theta = np.zeros(n + 1)
for epoch in range(self.epochs):
predictions = X_b @ self.theta
errors = predictions - y
cost = (1 / (2 * m)) * np.sum(errors ** 2)
self.loss_history.append(cost)
gradient = (1 / m) * X_b.T @ errors
self.theta -= self.lr * gradient
if epoch % 100 == 0:
print(f"Epoch {epoch:4d} Loss: {cost:.6f}")
return self
def predict(self, X: np.ndarray) -> np.ndarray:
if self.theta is None:
raise RuntimeError("Model not trained yet")
X_b = np.c_[np.ones((X.shape[0], 1)), X]
return X_b @ self.theta
def score(self, X: np.ndarray, y: np.ndarray) -> float:
"""R² coefficient of determination."""
y_pred = self.predict(X)
ss_res = np.sum((y - y_pred) ** 2)
ss_tot = np.sum((y - y.mean()) ** 2)
return 1 - ss_res / ss_tot
Test on Synthetic Data
# Generate data: y = 3x + 2 + noise
rng = np.random.default_rng(42)
X = rng.uniform(0, 10, (200, 1))
y = 3 * X.ravel() + 2 + rng.normal(0, 1.5, 200)
# Train
model = LinearRegression(learning_rate=0.01, epochs=1000)
model.fit(X, y)
print(f"Intercept (θ₀): {model.theta[0]:.4f}") # ≈ 2.0
print(f"Slope (θ₁): {model.theta[1]:.4f}") # ≈ 3.0
print(f"R²: {model.score(X, y):.4f}")
Output:
Intercept (θ₀): 2.0314
Slope (θ₁): 2.9887
R²: 0.9873
Feature Scaling
Gradient descent converges much faster with normalized features:
def normalize(X: np.ndarray) -> Tuple[np.ndarray, np.ndarray, np.ndarray]:
mean = X.mean(axis=0)
std = X.std(axis=0) + 1e-8 # avoid division by zero
return (X - mean) / std, mean, std
Linear regression is the foundation of all of machine learning. Every more complex algorithm builds on these ideas of cost functions and gradient optimization.
