Summarising Data
Mean, median, mode, range, IQR, standard deviation, standardised scores, skewness
Measures of Average
Measures of average (central tendency) describe the 'typical' value in a dataset. The three main measures are the mean, median, and mode.
The sum of all values divided by the number of values. Affected by extreme values (outliers).
x̄ = Σx ÷ n For grouped data: x̄ ≈ Σ(f × x) ÷ Σf (where x = mid-point)
The middle value when data is arranged in order. For n values, the median is at position (n+1)/2. Not affected by outliers.
The most frequently occurring value. A dataset can have no mode, one mode (unimodal), or two modes (bimodal). Best for qualitative data.
Mean: use when data is roughly symmetric with no outliers. Median: use when data is skewed or has outliers. Mode: use for qualitative data or when the most common value is needed.
Measures of Spread
Measures of spread describe how spread out the data is around the average.
Maximum value − Minimum value. Simple but affected by outliers.
IQR = Q3 − Q1. The spread of the middle 50% of the data. Not affected by outliers.
A measure of the average distance of each data point from the mean. A larger standard deviation means the data is more spread out.
σ = √[ Σ(x − x̄)² ÷ n ] Or equivalently: σ = √[ (Σx²/n) − x̄² ]
You will usually be given the standard deviation formula in the exam. The key steps are: (1) find the mean, (2) subtract the mean from each value, (3) square each result, (4) find the average of the squared differences, (5) take the square root.
Standardised Scores
A standardised score (z-score) allows comparison of values from different datasets by measuring how many standard deviations a value is from the mean.
z = (x − x̄) ÷ σ Where x = individual value, x̄ = mean, σ = standard deviation
Alice scores 72 in Maths (mean 65, SD 10) and 58 in English (mean 50, SD 6). In which subject did she perform better relative to her class?
Skewness
Skewness describes the asymmetry of a distribution.
The tail is on the right. Most data is clustered on the left. Mean > Median > Mode.
The tail is on the left. Most data is clustered on the right. Mean < Median < Mode.
The distribution is balanced. Mean = Median = Mode.
Skewness ≈ 3(Mean − Median) ÷ Standard Deviation Positive result → positive skew. Negative result → negative skew. Zero → symmetric.
You can identify skewness from a box plot: if the median is closer to Q1, the distribution is positively skewed. If closer to Q3, it is negatively skewed.