Variance & Standard Deviation

Concept 01 — Why It Matters

What Is "Spread" in Data?

In data science, we don't just care about the average. We also care about how much the data spreads out around that average. Two datasets can have the same mean but look completely different.

🎯

Analogy: Imagine two archers. Archer A always hits near the center — sometimes a little left, sometimes a little right. Archer B's shots are all over the place. Both have the same average hit point, but Archer A is far more consistent. Variance and Standard Deviation measure this consistency.

Let's take a real example. Suppose we have test scores from 6 students.

Student	Score	Notes
Alice	72	—
Bob	75	—
Carol	80	—
David	85	—
Eve	88	—
Frank	90	—

The mean (average) = (72 + 75 + 80 + 85 + 88 + 90) / 6 = 490 / 6 ≈ 81.67

Student test scores · Mean = 81.67

Concept 02 — Variance

What Is Variance?

Variance measures how far each data point is from the mean, on average. The bigger the variance, the more spread out the data is.

σ² = Σ (xᵢ − μ)² / N

Population Variance · σ² = sigma squared

Where:

xᵢ = each individual value
μ (mu) = the mean of all values
N = total number of data points
Σ = sum everything up

Step-by-Step Calculation

Find the Mean

μ = (72 + 75 + 80 + 85 + 88 + 90) / 6 = 81.67

Add all values, divide by count.

Subtract Mean from Each Value → Get Deviation

72 − 81.67 = −9.67 | 75 − 81.67 = −6.67 | 80 − 81.67 = −1.67

85 − 81.67 = 3.33 | 88 − 81.67 = 6.33 | 90 − 81.67 = 8.33

Negative = below mean · Positive = above mean. Note: they always sum to 0!

Square Each Deviation (to remove negatives)

(−9.67)² = 93.51 | (−6.67)² = 44.49 | (−1.67)² = 2.79

(3.33)² = 11.09 | (6.33)² = 40.07 | (8.33)² = 69.39

Squaring makes all values positive and penalizes big deviations more.

Sum All Squared Deviations

93.51 + 44.49 + 2.79 + 11.09 + 40.07 + 69.39 = 261.34

Divide by N (number of data points)

Variance σ² = 261.34 / 6 ≈ 43.56

This is the average of the squared deviations.

Student	Score (xᵢ)	Deviation (xᵢ − μ)	Squared Deviation
Alice	72	−9.67	93.51
Bob	75	−6.67	44.49
Carol	80	−1.67	2.79
David	85	+3.33	11.09
Eve	88	+6.33	40.07
Frank	90	+8.33	69.39
SUM	490	0.00	261.34

⚠️ The Problem with Variance

Variance is in squared units. If our scores are in "points", the variance is in "points²" — which is hard to interpret! That's why we need Standard Deviation.

Concept 03 — Standard Deviation

What Is Standard Deviation?

Standard Deviation is simply the square root of Variance. This brings it back to the original unit, making it human-readable and interpretable.

σ = √(Variance) = √(σ²)

Standard Deviation = square root of variance

Take the Square Root of Variance

σ = √43.56 ≈ 6.60

The standard deviation is 6.60 points — back in the same unit as our scores!

💡 How to Interpret This

A standard deviation of 6.60 points means that on average, each student's score differs from the mean (81.67) by about 6.60 points. Most scores fall within the range of 81.67 ± 6.60 = [75.07, 88.27].

📐

Variance

Average of squared differences from mean. Hard to interpret (wrong units). Useful for math and comparing datasets.

📏

Standard Deviation

Square root of variance. Easy to interpret (same unit as data). The go-to metric for spread in data science.

Concept 04 — Population vs Sample

A Critical Distinction

In real data science, you're rarely working with the entire population. You have a sample. This changes the formula slightly.

🌍

Population (σ²)

You have ALL data points.

σ² = Σ(xᵢ − μ)² / N

Divide by N.

🔬

Sample (s²)

You have only a SAMPLE of data.

s² = Σ(xᵢ − x̄)² / (N−1)

Divide by N−1 (Bessel's correction).

🏫

Why N−1? When you only have a sample, you're estimating the true mean. Using N−1 corrects for this bias and gives a more accurate estimate of the true population variance. In Python's numpy.std(), use ddof=1 for sample standard deviation.

Concept 06 — Real Applications

Where Is This Used?

Variance and Standard Deviation are fundamental to almost every area of data science.

📈

Finance & Stock Market

High std dev = high risk/volatility. Investors use it to measure how much a stock price fluctuates.

🤖

Machine Learning

Feature normalization (StandardScaler) divides by std dev so features are comparable during training.

🏭

Quality Control

Manufacturing uses std dev to ensure products meet specs. Low std dev = consistent products.

🧬

Medical Research

Clinical trials use std dev to measure how much patient responses vary from the average.

📊

A/B Testing

Std dev is used to calculate statistical significance when comparing two versions of a feature.

🌦️

Weather Forecasting

Variance in temperature data helps meteorologists understand how unpredictable the weather is.

Summary — Quick Reference

Everything in One Place

Concept	Formula	Our Example	Meaning
Mean	Σxᵢ / N	81.67	Center of data
Deviation	xᵢ − μ	−9.67 to +8.33	Distance from mean
Variance (pop.)	Σ(xᵢ−μ)² / N	43.56	Avg squared spread
Variance (sample)	Σ(xᵢ−x̄)² / (N−1)	52.27	Estimated pop. variance
Std Dev (pop.)	√(σ²)	6.60	Avg spread in same unit
Std Dev (sample)	√(s²)	7.23	Sample spread estimate

🐍 Python Code

# Using NumPy
import numpy as np

scores = [72, 75, 80, 85, 88, 90]

mean     = np.mean(scores)         # 81.67
variance = np.var(scores)          # 43.56  (population)
std_dev  = np.std(scores)          # 6.60   (population)

s_variance = np.var(scores, ddof=1)   # 52.27  (sample)
s_std_dev  = np.std(scores, ddof=1)   # 7.23   (sample)