Back to Chapter Notes
4

Scatter Diagrams and Correlation

Scatter diagrams, lines of best fit, Spearman's rank correlation, PMCC

Scatter Diagrams

A scatter diagram plots pairs of values (x, y) to investigate whether there is a relationship (correlation) between two variables.

Correlation

A measure of the strength and direction of the linear relationship between two variables.

Positive Correlation

As one variable increases, the other also increases. The points slope upwards from left to right.

Negative Correlation

As one variable increases, the other decreases. The points slope downwards from left to right.

No Correlation

There is no clear linear relationship between the variables. The points are scattered randomly.

Outlier

A point that does not fit the general pattern of the scatter diagram. It should be identified and investigated.

Correlation ≠ Causation

Just because two variables are correlated does not mean one causes the other. There may be a third variable (confounding variable) that explains the relationship.

Line of Best Fit

A line of best fit (regression line) summarises the relationship between two variables and can be used to make predictions.

Line of Best Fit

A straight line drawn through the mean point (x̄, ȳ) with roughly equal numbers of points above and below the line.

Interpolation

Using the line of best fit to predict a value within the range of the data. This is reliable.

Extrapolation

Using the line of best fit to predict a value outside the range of the data. This is unreliable as the relationship may not continue.

Exam Tip

The line of best fit must pass through the mean point (x̄, ȳ). Always calculate and plot this point first before drawing the line.

Spearman's Rank Correlation Coefficient

Spearman's Rank Correlation Coefficient (SRCC) measures the strength of the correlation between two variables using their ranks rather than their actual values.

Spearman's Rank Formula
rₛ = 1 − (6Σd²) ÷ (n(n²−1))

Where d = difference in ranks for each pair, n = number of data pairs
Interpreting rₛ
  • rₛ = +1: perfect positive correlation
  • rₛ = −1: perfect negative correlation
  • rₛ = 0: no correlation
  • 0.8 ≤ rₛ < 1: strong positive correlation
  • 0 < rₛ < 0.5: weak positive correlation
Tied Ranks

If two values are equal, give them the average of the ranks they would have occupied. For example, if two values tie for 3rd and 4th place, both get rank 3.5.

Product Moment Correlation Coefficient (PMCC)

The PMCC (Pearson's r) measures the strength of the linear correlation between two variables using the actual data values.

PMCC (r)

A value between −1 and +1. r = +1 means perfect positive linear correlation; r = −1 means perfect negative linear correlation; r = 0 means no linear correlation.

SRCC vs PMCC

Use SRCC when: data is ranked, data is not normally distributed, or there are outliers. Use PMCC when: data is continuous and approximately normally distributed with no extreme outliers.