ANNEX F: STATISTICAL SIGNIFICANCE AND RELIABILITY
Statistical significance
F.1 The formula used for calculating significant differences between sub-groups is as follows:
the standard deviations for two sub-groups are calculated as SD1 and SD2
1) Calculate an "overall" or "pooled" SD for the two groups together. This is very close to the weighted average; weighted by the relative sizes of the sub-groups in the sample.

2) Use this pooled measure to calculate the Standard Error of the Difference ( SED) between the sub-group means, i.e.:

3) Divide the difference between the sub-groups scores that you observe, by the SED. If the size of this result (technically referred to as the "t-score") is greater than 1.96 ( i.e. either less than -1.96 or greater than +1.96), then the difference is statistically significant at the 95% confidence level. In other words, there is sufficient evidence that scores in the underlying population are different for the two sub-groups. Thus:

Statistical Reliability
F.2 The respondents to the questionnaire are only a sample of the total 'population'. We cannot therefore be certain that the figures obtained are exactly those we would have if everybody had been interviewed (the 'true' values). However, we can predict the variation between the sample results and the 'true' values from a knowledge of the size of the samples on which the results are based and the number of times that a particular answer is given.
F.3 The confidence with which we can make this prediction is usually chosen to be 95% - that is, the chances are 19 in 20 that the 'true' value will fall within a specified range. The table below illustrates the predicted ranges for different sample sizes and percentages results at the '95% confidence interval', based on a random sample.
Table F.1: Predicted ranges for different sample sizes at the 95% confidence interval
Size of sample on which survey result is based | Approximate sampling tolerances applicable to percentages at or near these levels |
|---|
10% or 90% + | 30% or 70% + | 50% + |
|---|
100 interviews | 6 | 9 | 10 |
|---|
200 interviews | 4 | 6 | 7 |
|---|
300 interviews | 3 | 5 | 6 |
|---|
500 interviews | 3 | 4 | 4 |
|---|
1,000 interviews | 2 | 3 | 3 |
|---|
1,177 interviews | 2 | 3 | 3 |
|---|
F.4 For example, on a question where 50% of the people in a sample of 1,177 respond with a particular answer, the chances are 95 in 100 that this result would not vary by more than three percentage points, plus or minus from a complete coverage of the entire population using the same procedures. However, while it is true to conclude that the "actual" result (95 times out of 100) lies anywhere between 47% and 53%, it is proportionately more likely to be closer to the centre of this band ( i.e. at 50%).
F.5 Tolerances are also involved in the comparison of results from different parts of a sample. A difference, in other words, must be of at least a certain size to be considered statistically significant. The following table is a guide to the sampling tolerances applicable to comparisons.
Table F.2: Sampling tolerances
Size of samples compared | Differences required for significance at or near percentage levels |
|---|
10% or 90% + | 30% or 70% + | 50% + |
|---|
100 and 100 | 8 | 13 | 14 |
|---|
200 and 200 | 6 | 9 | 10 |
|---|
200 and 400 | 5 | 8 | 9 |
|---|
200 and 500 | 5 | 8 | 8 |
|---|
500 and 500 | 4 | 6 | 6 |
|---|
700 and 300 | 4 | 6 | 7 |
|---|
700 and 400 | 4 | 6 | 6 |
|---|
1,000 and 100 | 8 | 13 | 14 |
|---|
Table F.3: Demographic sub-group comparisons
Size of samples compared | Differences required for significance at or near percentage levels |
|---|
10% or 90% + | 30% or 70% + | 50% + |
|---|
Males vs. females ( 563 vs. 614) | 3 | 5 | 6 |
|---|
Age 16-24 vs. 65-74 (168 vs. 135) | 7 | 10 | 11 |
|---|
Easy to manage on income vs. difficult (531 vs. 182) | 5 | 8 | 8 |
|---|
Good or very good general health vs. bad or very bad general health (873 vs. 96) | 6 | 10 | 11 |
|---|