Processing and Representing Data
Frequency tables, histograms, pie charts, stem-and-leaf, box plots, cumulative frequency
Frequency Tables
A frequency table organises raw data into groups, making it easier to identify patterns and calculate statistics.
The number of times a value or group of values occurs in a dataset.
The range of values included in a class interval. For the class 10 ≤ x < 20, the class width is 10.
The middle value of a class interval. Used when estimating the mean from grouped data. For 10 ≤ x < 20, the mid-point is 15.
Mean ≈ Σ(f × x) ÷ Σf Where f = frequency, x = mid-point of each class
When calculating the estimated mean, always use the mid-point of each class interval, not the boundary values. The answer is an estimate because we do not know the exact values within each class.
Histograms
A histogram is used to display continuous data grouped into class intervals. Unlike a bar chart, the area of each bar (not the height) represents the frequency.
The height of each bar in a histogram. Frequency density = Frequency ÷ Class width.
Frequency Density = Frequency ÷ Class Width Frequency = Frequency Density × Class Width
A class interval 20 ≤ x < 30 has frequency 15. Calculate the frequency density.
Always label the y-axis 'Frequency Density' on a histogram, not 'Frequency'. If class widths are equal, the histogram looks like a bar chart — but the principle is the same.
Cumulative Frequency
A cumulative frequency diagram allows us to estimate the median, quartiles, and interquartile range from grouped data.
A running total of frequencies. Plot cumulative frequency against the upper class boundary of each interval.
- Median: read off at n/2 on the cumulative frequency axis
- Lower Quartile (Q1): read off at n/4
- Upper Quartile (Q3): read off at 3n/4
- IQR = Q3 − Q1
- Percentiles: read off at the appropriate fraction of n
Always plot cumulative frequency at the UPPER class boundary, not the mid-point. Draw a smooth S-shaped curve through the points.
Box Plots
A box plot (box-and-whisker diagram) displays the five-number summary of a dataset and allows easy comparison between distributions.
- Minimum value
- Lower Quartile (Q1)
- Median (Q2)
- Upper Quartile (Q3)
- Maximum value
A value that lies more than 1.5 × IQR below Q1 or above Q3. Outliers are plotted as separate crosses (×) beyond the whiskers.
When comparing two box plots, comment on: (1) the median — which group has a higher typical value? (2) the IQR — which group has more spread? (3) the range — which group has more overall spread? (4) skewness — is the distribution symmetric or skewed?
Stem-and-Leaf Diagrams
A stem-and-leaf diagram retains the original data values while organising them in order. It is useful for small datasets.
Two datasets share the same stem, with leaves going left and right. This allows direct comparison of two distributions.
Always include a key on a stem-and-leaf diagram. For example: '3 | 4 means 34'. Leaves must be written in order (smallest to largest away from the stem).
Pie Charts
A pie chart shows proportions as sectors of a circle. The angle of each sector is proportional to the frequency.
Angle = (Frequency ÷ Total frequency) × 360°
Two pie charts where the area of each circle is proportional to the total frequency it represents. The radius is proportional to the square root of the total frequency.
r₂/r₁ = √(n₂/n₁) Where r = radius, n = total frequency
For comparative pie charts, you must adjust the radius — not just draw two pie charts of the same size. The area represents the total, so area ∝ n, meaning radius ∝ √n.