Annex I: Survey design and methodology
I.1 Introduction
The Scottish Survey of Achievement 2006 was required to meet the range of high level objectives outlined in Section A. In addition, the following practical constraints were imposed where possible.
- The duration of an assessment session was designed to last about 40 minutes at P3/P5 and 60 minutes at P7/S2.
- There would be a maximum of three assessment sessions per pupil (though extra time was assumed for completion of questionnaires).
- The schools that had been invited to participate in the pre-testing of assessment material for the survey should not be selected for survey involvement, unless absolutely unavoidable.
- To further minimise the burden on schools, there should be as little overlap as possible between the schools selected for inclusion in the SSA 2006 and the international study Progress in International Reading Literacy Study ( PIRLS) which also ran in the same year. Where an overlap could not be avoided, PIRLS schools were excluded from the practical elements of the survey and also the numeracy tests.
- Wherever practicable, the total number of pupils selected for testing in an individual school should not exceed 20 for primary schools, and 30 for secondary schools;
- A maximum of 12 pupils per school should be selected for participation in the practical elements of the survey.
- Wherever possible, pupils should be drawn from one stage only in primary schools.
This annex explains how the sample was designed in order to best meet these objectives and constraints, and also explains how the results were analysed.
I.2 The sample design
The SSA is principally intended to produce attainment estimates for the population of pupils at a stage across Scotland, whether taught in the publicly funded or the independent sector, however large or small their schools, and wherever they might be located. The only pupils deliberately excluded in the 2006 survey were those being taught in Gaelic units, and those in special schools. Pupils with special educational needs who were being taught in mainstream schools were not excluded, although schools could use their discretion and withdraw such pupils from their samples, before or during testing, should they consider the experience potentially or actually distressing for them.
In order to meet the survey objective of providing attainment estimates at local authority level, it was necessary to increase the pupil sample sizes that would normally be available within a representative national pupil sample for each affected authority. In 2005, in order to minimise the inevitably increased survey burden on schools, the decision was made to report on only half the 32 authorities that year, with the other half being reported in 2006. The 16 local authorities to have separate attainment reporting in this first, 2005, SSA (see Table 1) were not randomly selected, but were identified by HMIE on the basis of their preparedness to make best use for their own system evaluation purposes of the attainment data that would be produced for their pupils. Although not necessarily selected to be representative of all 32 Scottish local authorities, the set of 16 reporting authorities in 2005 did nevertheless include authorities from across the country, large and small, urban and rural, socially deprived and socially advantaged. The same is the case for 2006.
But what should be the extent of the sample boosting in order to facilitate reporting at a local authority level? When producing estimated population proportions on the basis of simple random samples, a sample size of 1,000 pupils would produce an estimate with a maximum associated margin of error of around three percentage points. So, we might say that the estimated proportion of P3 pupils deemed to be working at Level B in numeracy is 57% plus or minus 3%, having assessed 1,000 P3 pupils. With a sample size of 500 pupils the margin of error would increase to more than four percentage points. With 250 pupils the margin of error would be around six percentage points.
A decision was made in 2005 to aim for sample sizes of around 450 pupils in each reporting authority 4, to give authority attainment estimates with margins of error of around five percentage points 5. It was further decided to select a total of 1,600 pupils at each stage to represent the group of 16 non-reporting authorities; that is an average of 100 pupils per authority, the actual number per authority reflecting that authority's population size. The group of independent schools would be represented by 100 pupils at each stage. In practice, these pupil numbers were increased slightly to allow for an estimated 10% or so pupil loss through absence.
Table 1
SSA Reporting Authorities for 2005 and 2006
2005 | 2006 |
|---|
Aberdeen City | Aberdeenshire |
Angus | Argyll & Bute |
East Ayrshire | Clackmannanshire |
East Dunbartonshire | Dumfries & Galloway |
East Renfrewshire | Dundee City |
Edinburgh City | East Lothian |
Highland | Eilean Siar |
Inverclyde | Falkirk |
North Ayrshire | Fife |
North Lanarkshire | Glasgow City |
Perth & Kinross | Midlothian |
Renfrewshire | Moray |
South Ayrshire | Orkney Islands |
South Lanarkshire | Scottish Borders |
Stirling | Shetland Islands |
West Lothian | West Dunbartonshire |
I.2.1 Sampling in non-reporting authorities and in the independent sector
The sixteen non-reporting authorities were treated as a single group for sampling purposes, with the independent sector forming a separate group. In the non-reporting authority group, a two-stage proportionate sampling scheme was applied to produce the 1,600 pupils needed at each stage, with separate school samples drawn without replacement for the three primary stages.
Before sampling began, publicly funded schools in the non-reporting authority group were first classified by authority (16 of these) and by size (two size groups: less than 20 pupils and 20+ pupils in the relevant stage in the primary sector; less than 30 pupils and 30+ pupils in the secondary sector). The intention behind the size stratification was that any selected small schools would, for their own convenience, be asked to provide all their relevant pupils for assessment. Because every relevant pupil in selected small schools would therefore have the same, 100%, chance of selection, the schools, too, were selected with equal selection probabilities. This strategy gives every pupil in every school in a 'small school' stratum the same probability of selection, and therefore in principle produces an unbiased sample of 'small school' pupils.
In the group of larger schools at each stage in each authority, schools were drawn with probabilities of selection proportional to stage size, and each selected school was asked to provide the same number of randomly selected pupils for assessment at the stage concerned (20 in primaries and 30 in secondaries). Again, this strategy gave every pupil in a 'large school' stratum an equal chance of selection, in principle producing an unbiased pupil sample.
The numbers of pupils at each stage selected to represent the various strata (school size groups in 16 authorities) reflected respective pupil population sizes. In other words, the sampling was proportionate.
No school stratification was imposed in the independent sector before sampling was carried out. Rather, the requirement for around 100 pupils at a stage, with 20 in each primary school and 30 in each secondary school (where available), determined the number of schools needed, and these were selected with probability of selection proportional to size.
I.2.2 Sampling in reporting authorities
In each reporting authority, the intention was to draw a sample of around 450 pupils, to allow for a typical 10% loss of pupils through absence on assessment days.
Some of the reporting authorities are sufficiently large (Aberdeenshire, Fife and Glasgow City) in terms of school and pupil numbers that it was possible to use the non-reporting sampling strategy to produce their pupil samples. Within each of these authorities maintained schools were stratified by size (as above) prior to sampling, and the same two sampling strategies described above for the non-reporting authorities were applied, with separate pupil samples being drawn at each of the four stages (with no schools selected at more than one stage).
In the remaining thirteen reporting authorities, a different sampling strategy was needed, because in most of these authorities there were simply too few schools available in either sector for the constraint on pupil numbers per school (20 pupils maximum at a single primary stage and 30 pupils maximum in a school at S2) to be possible. In these cases all the schools in the authority in both sectors were selected by default for survey participation, and primary schools had to provide pupils at all three stages for assessment.
Given their unavoidable inclusion in the authority samples, every school in each of the thirteen smaller authorities therefore had a 100% chance of survey selection. In order, therefore, to produce unbiased pupil samples for each of the thirteen reporting authorities, all the schools in each authority had to provide the same proportion of their pupils for assessment rather than a fixed number. The proportion concerned - the sampling fraction - was given by the proportion of the authority's population size at a stage that the required 450 pupils represented: the sampling fraction varied from authority to authority and from stage to stage ( e.g. 49% at P3 in Argyll & Bute, 33% at S2 in Scottish Borders).
A number of reporting authorities were quite small, containing fewer than the required 450 pupils per stage. In these cases, almost every pupil within the relevant stage in the authority was included. These authorities were: Eilean Siar, Orkney and Shetland Islands. Finally, Clackmannanshire authority requested that all of their pupils at the relevant stages be included in the survey.
To make survey involvement slightly less burdensome for the schools, in primary schools with fewer than ten pupils at the three stages combined, all the pupils were automatically included in the sample.
I.2.3 Summary of sampling strategies
Non-reporting authorities
- Around 1,600 pupils were selected at random at each stage from publicly funded schools to represent the whole group of 16 authorities, through 2-stage proportionate stratified sampling.
- The school population was stratified by authority and school size (stage size: <20 and 20+ for primaries, <30 and 30+ for secondaries) prior to sampling.
- Separate school samples were drawn for each stage, with no overlap in the primary samples.
- In the small school-size strata, schools were selected by simple random sampling (equal probabilities of selection), with all pupils at the relevant stage automatically selected for assessment.
- In the large school-size strata, schools were selected by pps sampling (probability of selection proportional to size of stage) and then 20 (primary stages) or 30 (S2) pupils were selected at random from within each school for assessment.
Largest reporting authorities
- Around 450 pupils were selected at random at each stage from publicly funded schools to represent the individual authority, through 2-stage proportionate stratified sampling.
- The authority's school population was stratified by school size (stage size: <20 and 20+ for primaries, <30 and 30+ for secondaries) prior to sampling.
- Separate school samples were drawn for each stage, with no overlap in the primary samples.
- In the small school-size strata, schools were selected by simple random sampling (equal probabilities of selection), with all pupils at the relevant stage automatically selected for assessment.
- In the large school-size strata, schools were selected by pps sampling (probability of selection proportional to size of stage) and then twenty (primary stages) or thirty (S2) pupils were selected at random from within each school for assessment.
Other reporting authorities
- Around 450 pupils were selected at random at each stage from publicly funded schools to represent the individual authority, through proportionate sampling.
- No school sampling was involved in either sector, since every school needed to participate.
- Primary schools provided pupils at all three stages (P3, P5 and P7).
- A fixed proportion of pupils was randomly selected at each relevant stage from within each school, the proportion being given by the 450 pupils needed divided by the number available in the authority's pupil population at the stage concerned.
- In primary schools with fewer than ten pupils in total across the three stages, all the pupils were selected for assessment.
Independent schools
- At each stage around 100 pupils were randomly selected for assessment, using 2-stage sampling.
- Separate school samples were drawn for each stage.
- Schools were selected by pps sampling, and a fixed number of pupils then selected from within each selected school for assessment: twenty pupils at the relevant stage in primaries and thirty S2 pupils in secondaries.
- Schools with fewer than twenty (primary stages) or thirty (S2) pupils were to provide all their pupils for assessment.
The result of this complex sampling was an intended national pupil sample at each stage of around 9,000 pupils, or around 15% of the pupil population. The pupils were drawn from just over 1,350 different schools throughout the country: 1,134 primary schools and 222 secondary schools. Table 2 provides a detailed sample breakdown.
As the table shows, in the reporting authorities 820 primary schools and 159 secondary schools were selected, in principle contributing a total of 26,500 pupils for assessment (around 6,700 at each stage). The total number of pupils selected in each school varied from one pupil to 227 pupils in the primary sector (P3, P5 and P7 combined), and from two pupils to 243 pupils in the secondary sector (S2). The authority target sample size of 445 pupils at each stage varied slightly from authority to authority, because the sampling fraction to be applied to each school's stage roll had to be dynamically adjusted in order to produce integer numbers of pupils.
Table 2
The intended pupil samples for written assessment in the 2006 SSA
(Numbers of schools and pupils selected for survey participation)
| Schools: | Pupils: |
|---|
Reporting authorities | Primary | Secondary | P3 | P5 | P7 | S2 | Total |
|---|
Aberdeenshire | 103 | 17 | 444 | 443 | 451 | 445 | 1,783 |
|---|
Argyll & Bute | 80 | 10 | 445 | 445 | 445 | 445 | 1,780 |
|---|
Clackmannanshire | 19 | 3 | 445 | 445 | 445 | 445 | 1,780 |
|---|
Dumfries & Galloway | 71 | 16 | 446 | 446 | 446 | 445 | 1,782 |
|---|
Dundee City | 41 | 10 | 445 | 445 | 445 | 445 | 1,780 |
|---|
East Lothian | 35 | 6 | 445 | 445 | 445 | 445 | 1,780 |
|---|
Eilean Siar | 39 | 11 | 305 | 305 | 305 | 362 | 1,277 |
|---|
Falkirk | 48 | 8 | 445 | 445 | 445 | 445 | 1,780 |
|---|
Fife | 78 | 19 | 450 | 446 | 449 | 445 | 1,790 |
|---|
Glasgow City | 70 | 15 | 445 | 451 | 447 | 450 | 1,793 |
|---|
Midlothian | 35 | 6 | 445 | 445 | 445 | 445 | 1,780 |
|---|
Moray | 46 | 8 | 445 | 445 | 445 | 445 | 1,780 |
|---|
Orkney Islands | 21 | 6 | 243 | 243 | 243 | 261 | 990 |
|---|
Scottish Borders | 68 | 9 | 445 | 445 | 445 | 445 | 1,780 |
|---|
Shetland Islands | 32 | 8 | 272 | 272 | 272 | 336 | 1,153 |
|---|
West Dunbartonshire | 34 | 7 | 445 | 445 | 445 | 445 | 1,780 |
|---|
Total for reporting authorities | 820 | 159 | 6,610 | 6,611 | 6,618 | 6,749 | 26,588 |
|---|
Other authorities |
|---|
Aberdeen City | 16 | 3 | 100 | 95 | 120 | 90 | 405 |
|---|
Angus | 12 | 2 | 66 | 77 | 80 | 60 | 283 |
|---|
East Ayrshire | 12 | 3 | 80 | 68 | 78 | 90 | 316 |
|---|
East Dunbartonshire | 12 | 2 | 80 | 80 | 79 | 60 | 299 |
|---|
East Renfrewshire | 10 | 2 | 60 | 60 | 80 | 60 | 260 |
|---|
Edinburgh City | 33 | 7 | 218 | 211 | 214 | 210 | 853 |
|---|
Highland | 32 | 5 | 147 | 145 | 142 | 150 | 584 |
|---|
Inverclyde | 9 | 2 | 49 | 60 | 60 | 60 | 229 |
|---|
North Ayrshire | 14 | 3 | 80 | 93 | 100 | 90 | 363 |
|---|
North Lanarkshire | 35 | 7 | 229 | 236 | 214 | 210 | 889 |
|---|
Perth & Kinross | 15 | 3 | 89 | 88 | 89 | 90 | 356 |
|---|
Renfrewshire | 18 | 4 | 120 | 108 | 120 | 120 | 468 |
|---|
South Ayrshire | 11 | 2 | 62 | 80 | 60 | 60 | 262 |
|---|
South Lanarkshire | 33 | 7 | 195 | 192 | 198 | 210 | 795 |
|---|
Stirling | 10 | 2 | 55 | 58 | 60 | 60 | 233 |
|---|
West Lothian | 20 | 4 | 118 | 117 | 115 | 120 | 470 |
|---|
Total for non-reporting authorities | 292 | 58 | 1,748 | 1,768 | 1,809 | 1,740 | 7,065 |
|---|
Independent schools | 22 | 5 | 104 | 109 | 116 | 119 | 448 |
|---|
Scotland total | 1,134 | 222 | 8,462 | 8,488 | 8,543 | 8,608 | 34,101 |
|---|
In the group of 'non-reporting' authorities, 292 primary schools and 58 secondary schools were selected for survey participation, and within them a total of just under 7,000 pupils were randomly selected for assessment: roughly 1,800 per stage. In all of the 'non-reporting' authorities, the schools were all large enough to provide twenty or thirty pupils each (at primary and secondary respectively) for assessment.
In the independent sector, twenty-two schools were randomly selected to provide pupils for assessment at the primary stages and five schools were randomly selected to provide pupils for assessment at S2. Between them, these schools were to provide for assessment 448 randomly selected pupils across the four stages. This number was later revised downwards, because of an additional requirement on these schools to provide SEED with a complete list of pupils for sampling purposes. Some schools were unable to provide such a list and therefore had to be excluded from the survey for practical reasons.
The stage samples were by design disproportionate, with reporting authorities over-represented within them and non-reporting authorities under-represented. It follows that the intended SSA pupil samples were not self-weighting, and that during data analysis, when the estimated national attainment proportions were being calculated, appropriate adjustment (data weighting) would be required to compensate for the deliberate bias in authority representation. Further information is provided in section I.7 below.
I.3 Reading, numeracy and Social Subjects enquiry skills assessment
In most assessment situations, schools and pupils are not the only elements that are sampled. The test items and tasks which the pupils attempt are also essentially samples. They are samples of all the items and tasks that already exist or which could be developed to represent the abilities/skills being assessed (reading, numeracy, etc), i.e. to represent the relevant attainment 'domain'.
As with the 2005 SSA, the sample could not accommodate the test-based assessment of all of the skill areas identified as within the scope of the survey. For this reason, and also to address continuing concerns about the validity of assessing writing skills in the relatively artificial and time-constrained context of a national survey, it was decided to estimate writing attainment on the basis of class teachers' judgments rather than through in-survey testing, with a subset of submitted and rated writing evaluated through moderation. This left the assessment of Social Subjects enquiry skills, reading and numeracy to be accommodated within the written survey itself.
The constraint of three assessment sessions per pupil was met by assessing half of the pupils in the sample for reading and the other half for numeracy and Social Subjects. For those pupils involved in the reading assessments, each was required to complete three tasks (one at each of three consecutive levels). One reading task required an entire assessment session. The remaining pupils were required to complete two numeracy booklets (each test booklet containing items at three different levels) and one Social Subjects booklet. In this way, the constraint on the duration of an assessment session was also met.
'Multiple matrix sampling' was employed in the distribution of items and tasks among the pupils. Multiple matrix sampling is simply a strategy for ensuring that as many test items as possible are used in a survey, maximising curriculum coverage and therefore assessment validity, without any one pupil being required to attempt unacceptably long tests, or to be assessed over unacceptably long periods of time. Booklets were randomly allocated to pupils in such a way that as few pupils as possible would be faced with the same task or booklet in any particular school (minimising any possibility of school effects), whilst all tasks/booklets would eventually be attempted by similarly sized and similarly representative national and authority samples of pupils ('interpenetrating' or 'concurrent' samples).
More information about the tasks used is available in Annexes II.1, II.2 and II.3.
I.4 Writing assessments
As noted earlier, for reasons of survey pressure (reading, numeracy and Social Subjects given priority within a large but stretched survey sample) and authenticity (timed unsupported writing being considered less valid than in-class supported writing), no direct writing assessment took place within the 2006 survey itself. Instead, for a random third of the pupils in the survey sample at each stage, schools were invited to forward a piece of extended writing of a specified genre that would illustrate the level the pupil was working at currently: genres - 'personal', 'imaginative', 'functional' - were pre-allocated to pupils at random (essentially another example of multiple matrix sampling).
More information about the writing assessments is available in Annex II.4.
I.5 The sampling strategy for the practical assessments
Practically-based assessment is more costly and more logistically challenging than pencil and paper assessment, and for this reason it was decided that practical assessments would be undertaken in a subsample, rather than all of, the survey schools. The results of the practical assessments would also be reported at national level only. For this reason, the practical pupil samples were to be nationally representative, i.e. there would be no over-representation of reporting authorities. Thus, if x% of the pupils in the country were in Authority X, then x% of the pupils in the nationally representative practical sample should also have been in Authority X.
Following practice in the 2005 SSA, and working on the basis of recruitment feasibility and cost, it was planned to recruit 160 practising teachers to work as itinerant field officers for the purpose of the practical assessment. These individuals would work in pairs, each pair spending a day in each of five assigned schools, organising and supervising pupil assessments, and sometimes making attainment judgments themselves. Clearly, 160 field officers in total, working in pairs, each pair visiting up to ten schools, suggests 800 school visits in total, or about half the schools in the main survey, with up to 9,600 pupils assessed in total over the four stages.
Schools were randomly selected for involvement in the practical assessments, but with two important constraints. To maximise use of the field officers' time, it was decided to select for practical assessments schools that were within easy travelling of the field officers' homes, and that had at least twenty pupils at a stage in their main survey sample - or at least twenty sample pupils in P3 and P5 combined. Clearly, these constraints meant that at the primary stages the resulting practical samples could never be faithfully representative of the national pupil populations, since they were by design biased in favour of larger primary schools. However, if we can assume that size of school is not a relevant factor in terms of the practical skills of pupils then the performance findings that have emerged from the practical assessments will nevertheless be valid in reflecting national patterns of practical skills attainment.
Given the location constraint, it would not be possible to finalise the sub-sample of schools that would be asked to participate in the practical component of the survey until the final list of field officers was known. But a provisional sample of schools was drawn well before the survey took place, by randomly selecting schools with twenty or more pupils at one stage in their written survey sample, in appropriate numbers from each authority. The selected sub-sample was larger than needed, since it was expected that not every school in the list would be able to be visited, either because insufficient numbers of field officers would be available or because the school's location would prohibit a field officer visit.
All 32 local education authorities were invited to nominate practising teachers to serve as field officers. The numbers of field officers requested from each authority reflected the authority's relative size, in terms of pupil population. This is because assessing the x% of sample pupils from Authority X in the sample would require x% of the recruited field officers to be from Authority X, since, for efficiency reasons, field officers would generally be visiting schools in their own authorities.
In the event, 133 teachers were nominated from 29 of the 32 authorities to serve as field officers, and released from their schools for the required eight days each (a preparatory day, a training day, five days for school visits and a debriefing day). However, for a variety of reasons ( e.g. illness) only 124 field officers were trained. Working as 62 pairs, the maximum number of schools that could be visited was 620, and the maximum possible number of pupils that could be tested at each stage was just over 7,400.
In each 'practical' school, up to four pupils at the stage concerned were randomly selected for the assessment of Social Subjects enquiry skills, up to four for the assessment of ICT skills, and up to four for the assessment of skills in problem solving and working with others. Further detail about the practical tasks is available in Annex II.5.
For all assessments conducted within the practical component of the survey, attainment results are reported as field officer level judgments or as percentages of pupils. Findings are presented in Chapter F as sample statistics only, with no data weighting.
I.6 Participation rates
Selected schools were not obliged to take part in the survey, and in those schools that do so there are always some pupils absent on the assessment days. Table 3 presents the statistics on school participation in the main survey.
Table 3
School participation statistics*
| P3 | P5 | P7 | S2 |
|---|
Schools selected for participation | 722 | 734 | 716 | 222 |
|---|
Schools withdrawn by authorities | 4 | 4 | 6 | 3 |
|---|
Schools invited to participate | 718 | 730 | 710 | 219 |
|---|
Schools agreeing to participate | 683 | 696 | 678 | 199 |
|---|
Schools returning completed test booklets | 648 | 662 | 644 | 177 |
|---|
Participation rate (%) among the selected schools | 90 | 90 | 90 | 80 |
|---|
Schools that contributed pupil writing samples | 564 | 564 | 555 | 155 |
|---|
Schools that participated in practical assessments | 148 | 142 | 160 | 122 |
|---|
Schools that returned pupil questionnaires | 617 | 609 | 616 | 179 |
|---|
*For the written assessments, the majority of the primary schools in reporting authorities appear in the statistics for two or more of the primary stages
A very small number of schools were withdrawn by their authorities from the initial sample lists, principally because of staffing problems or school amalgamation/closure. Rather more schools declined the invitation to participate in the survey or failed to respond to the invitation by the due date. Where reasons were offered by schools for declining the invitation to participate, reasons given included staffing difficulties, HMIE inspections, involvement in other surveys/testing, accommodation problems and mergers. Among those schools that did agree to participate, a number failed to return completed test booklets.
The participation rate among the originally selected schools was 90% among primary schools and 80% among secondary schools, figures entirely in line with those for the 2005 survey. Interestingly there was no evidence of any tendency to decline to participate or to fail to return booklets the larger the pupil sample requested. In total 977 primary and 177 secondary schools took part in the survey.
The numbers of pupils originally selected for participation in the assessment of reading or numeracy at the four stages were immediately reduced when schools were withdrawn from the survey sample by their authorities, and were reduced further as schools declined the invitation to participate. In addition, some schools that had agreed to participate did not in the event do so (completed tests were not returned), and this resulted in further losses in the pupil samples. Finally, in the schools that did undertake the assessments a very small number of pupils could not be assessed, because they had left the school since the sample was drawn, because they were withdrawn from the sample by the schools (a tiny number of special needs pupils), or because they were absent during the assessment period. Absence was the major contributor to pupil loss at this stage. The result of these losses is that in the primary stages just over 80% of the pupils originally selected for written assessment were actually assessed, compared with just over 75% at S2 (see Table 4), figures again entirely in line with those for the 2005 survey. In total 21,215 Primary and 6,581 Secondary pupils were assessed in the survey.
Table 4
Pupil participation statistics
| P3 | P5 | P7 | S2 |
|---|
Pupils originally selected for participation | 8,462 | 8,488 | 8,543 | 8,608 |
|---|
Pupils actually assessed (reading, numeracy or social subjects enquiry skills)* | 6,914 | 7,065 | 7,236 | 6,581 |
|---|
% of pupils originally selected | 82 | 83 | 86 | 76 |
|---|
Pupils involved in the analysis of reading | 3,475 | 3,556 | 3,645 | 3,332 |
|---|
Pupils involved in the analysis of numeracy** | 3,316 | 3,221 | 3,500 | 3,011 |
|---|
Pupils involved in the analysis of social subjects | 3,334 | 3,424 | 3,521 | 3,053 |
|---|
Pupils involved in the moderation of writing | 1,968 | 1,799 | 1,707 | 1,413 |
|---|
Pupils involved in the practical assessments | 1,256 | 1,203 | 1,435 | 1,102 |
|---|
Pupils returning completed questionnaires | 6,344 | 6,324 | 6,797 | 5,831 |
|---|
* P5 pupils in schools that were also participating in the PIRLS survey were allocated a single reading task or a social subjects task in place of the usual three reading tasks or two numeracy tests plus a social subjects task
** Pupils contributed to the analysis of numeracy only if they had attempted both numeracy tests that were assigned to them
Gender and deprivation imbalances were redressed during attainment estimation, through appropriate data weighting. More information about how this was done is provided in I.7 below.
I.7 Data weighting procedures
Due to survey non-response and national sample imbalances caused by the need for Local Authority reporting, the reading, numeracy and social subjects written test data needed to be weighted to produce nationally representative attainment results.
The weighting attached to each pupil comprised two components. The first part of the weighting adjusts for imbalances in the pupil sample within the school and is equal to the total number of pupils in the school who are in the same stage and have the same gender and deprivation score as the pupil divided by the number of those pupils who were included in the assessment.
The second part of the weighting adjusts for imbalances at the authority level and is equal to total number of pupils in the authority with the same gender, deprivation score and stage as the pupil divided by the total number of such pupils who attended a school that participated in the assessments.
Multiplying these two weights together gives the pupil's overall weight. A more detailed explanation of the weighting methodology follows:
Since there are many variables involved in the computation of weights for this survey, use of conventional subscript notation would result routinely in expressions involving six or seven subscripts, which could be very difficult to read. In this section, therefore, square brackets are used rather than reduced-font subscripts. Thus the expression p iskgdv/b will normally appear here as p[i,s,k,g,d,v/b].
The variables involved in the computation of weights for individual pupil results are as follows:
- School, designated s, ranging over all Scottish schools.
- Stages, designated k, drawn from the set {P3,P5,P7,S2}.
- Pupils within schools, designated i.
- Gender, designated g, drawn from the set {G,B,N}, standing for Girl, Boy and Not specified, respectively.
- Deprivation index d
= 1 if a pupil lies within deprivation decile 1 or 2
= 2 if a pupil lies within deprivation deciles 3-10
= 0 otherwise (typically unspecified). - Level, designated v, drawn from the set {A,B,C,D,E,F}.
- Authority band, designated b. There are two categories of authority: the 16 reporting authorities, and the 16 non-reporting authorities. Reporting authorities were treated separately, each as a single band. Non-reporting authorities were considered together in a single band. Independent schools were also grouped together, regardless of their location, in a single band. Schools are, of course, completely nested in bands.

Summation over a particular subscript is indicated by a dot. Thus p[.,s,k,.,.,v/b] denotes the total number of pupils in school s at stage k tested at level v in band b. For the special case of level, the dot represents aggregation over pupils tested at one or more levels; an asterisk is used here as a special notation to denote aggregation over all pupils, whether tested or not. Thus p[.,s,k,.,.,./b] denotes the total number of pupils tested at stage k in school s, while p[.,s,k,.,.,*/b] stands for the total pupil roll size for stage k in school s, including pupils not tested. Similarly, p[.,.,k,.,.,*/.] denotes the total size of the pupil population in Scotland at stage k.
As a convenient shorthand, a pupil at stage k with gender g and deprivation index d is referred to as belonging to the group kgd. This shorthand can also be extended to cover aggregates, so that, for example, the group k.. contains all pupils at stage k.

The quantity r[i,s,k,g,d,v/b] is of interest not so much in itself but for its contribution to the aggregate r[.,s,k,g,d,v/b], which is equal to the roll size of group kgd in school s, provided school s contributed to the kgd sample at level v, and zero otherwise.
Under certain circumstances, it can happen that the actual number of pupils sampled at a given stage in a particular school, p[.,s,k,g,d,v/b], turns out to be greater than r[.,s,k,g,d,v/b], the reported group roll size. In order to avoid such paradoxes, in practice for computing weightings this composite value is used:
{1} | 
|
Each pupil in school s, tested at level v, with gender g and deprivation index d, has weighting:
{2} | 
|
The first part of {2} is the ratio of the total roll of group kgd pupils in school s to the total number of group kgd pupils in the same school s tested at level v. It represents the weight associated with school s in group kgd at level v.
The second part of {2} is the weight associated with the whole of authority band b, computed as the ratio of the total group kgd roll in authority band b to the total group roll size considering only schools in that authority which contributed to the kgd sample at level v.
Summing {2} over pupils and schools, we should obtain
{3} | 
|
In other words, the sum of weights of all sampled pupils at level v in group kgd within an authority band should equal the total population roll size for that group within the band.
It is often convenient to normalise the basic weighting by dividing by the total roll size and multiplying by 100:
{4} | 
|
By substituting the total population roll size at stage k, p[.,.,k,.,.,*/.], for the divisor in {4}, we obtain the normalised weight for a pupil within the country, rather than within the authority alone.
To restrict attention to a particular group, we simply do not aggregate over the group. For example, the expression for the weight for pupils in a given school, restricted to deprived girls, considered within the authority band, would be:
{5} | 
|
The corresponding normalised weighting would be:
{6} | 
|
Now define 0 = f[i,s,k,g,d,v/b] = 1 as the proportion of correct marks scored by pupil i from school s in the level v assessment. f[i,s,k,g,d,v/b] is undefined for p[i,s,k,g,d,v/b] = 0.
f[i,s,k,g,d,v/b] and p[i,s,k,g,d,v/b] can be abbreviated to f[i,s,v] and p[i,s,v], respectively, where there is no ambiguity. Similarly, we can usually abbreviate w[i,s,k,g,d,v/b] to w[i,s,v/b], when there is no risk of ambiguity.
Now write f p(i,s,v) = 1 when f[i,s,v] = p, 0 otherwise. Then f 0.5(i,s,v) = 1 characterises a "good start" at level v in the subject, a pupil showing f 0.65(i,s,v) = 1 is deemed to have "well-established" skills at level v, and pupils such that f 0.8(i,s,v) = 1 are said to have "very good" attainment at level v. f p(i,s,v) can be written in full as f p(i,s,k,g,d,v/b), when necessary to avoid ambiguity.
If, now, for each sampled pupil in the group of interest, we multiply f p(i,s,v) by w[i,s,v] and sum over all pupils in the group, we obtain an estimate of the number of pupils achieving p relative to the corresponding group in the population.
For example
{7} | 
|
estimates the number of pupils at stage k in authority band b, of gender g and deprivation index d, achieving a "well-established" result at level v.
To express the same quantity as a percentage of all pupils at stage k in band b, of gender g and deprivation index d relative to level v, replace w[i,s,k,g,d,v/b] with the normalised w', as in
{8} | 
|
I.8 Estimating standard errors through the jackknife procedure
The weighting methodology is designed to reduce the effects of any sampling or response bias, but, as with any sample survey, there is always a degree of uncertainty in the SSA results. The likely extent of the sampling variability can be quantified by calculating the 'standard error' associated with an estimate produced from a random sample.
Statistical sampling theory states that, on average.
- Only about one sample in three would produce an estimate that differed from the (unknown) true value by more than one standard error.
- Only about one sample in twenty would produce an estimate that differed from the true value by more than two standard errors.
- Only about one sample in 400 would produce an estimate that differed from the true value by more than three standard errors.
By convention, the '95% confidence interval' is defined as the estimate plus or minus about twice the standard error because there is only a 5% chance (on average) that a sample would produce an estimate that differs from the true value of that quantity by more than this amount.
The standard error of an estimated proportion will depend upon several things, mainly the value of the estimate and the size of the sample (or sub-sample) from which it was calculated. It is worth noting that if the estimate is 0 or 100 percent, then the standard error for the estimate will be equal to 0. This does not mean that we are sure that the true population proportion will be 0 or 100 percent also, but this is our estimate from the sample drawn.
The standard error can be calculated in a number of ways, but for the SSA it has been calculated using the jackknife procedure. The SSA sample is selected using a complex multi-stage sampling technique, which means that the standard formulas used to calculate the standard error from a simple random sample would underestimate the standard error.
The jackknife technique was chosen because it provided unbiased estimates of the sampling errors of the percentages that the SSA usually reports on.
The jackknife procedure is often referred to as the 'leave one out' method. The idea of the jackknife procedure is that, given a dataset with n observations (or sampling units), n re-sampled datasets are created by excluding each observation in turn from the original dataset. The new datasets are very similar, but the variability among them allows us to calculate an unbiased estimate of the standard error of the original dataset.
The first stage in calculating the jackknife estimate of the standard error is to calculate n estimates
, where, for each i in 1 to n,
is obtained by excluding the i th observation so that each
is calculated with a sample size of n-1. From this it is then possible to calculate the standard error of the estimate by looking at how the jackknife estimates vary around the sample estimate.
The mean of
is defined as:

The jackknife estimate of the statistic is defined as:

Where
is the estimate based on all n observations.
The variance of the estimator
is equal to:

The jackknife estimate of the standard error of
is:

I.9 Statistical significance
Because the survey's estimates may be affected by sampling errors, apparent differences of a few percentage points between sub-samples may not reflect real differences in the population. It might be that the true values in the population are similar, but the random selection of pupils for the survey has, by chance, produced a high estimate for one sub-sample and a low estimate for the other.
Throughout the report, a number of differences are referred to as being statistically significant. Usually, if something is described as being significant it means that it is important or special, but this is not the case when talking about statistical significance. A difference between two sub-groups is statistically significant if it is so large that a difference of that size (or greater) is unlikely to have occurred purely by chance.
When analysing the SSA data, statistical tests were used to compare the results from different sub-groups. If the differences between the sub-groups are large enough and the standard errors of the estimates are small enough, then we can say that the differences are likely to be genuine features of the population and that they are statistically significant.
For a crude check, if the difference between sub-groups is more than twice the sum of the standard errors of the two groups then the difference is statistically significant. If the difference is less than double the largest standard error of the two groups, then the difference is not statistically significant. Otherwise, a statistical test is needed to determine statistical significance.
All the statistical tests carried out in the SSA report are carried out at the 5% level, which means that a difference is considered significant if it would only have occurred once in 20 different samples. Generally speaking this means that in order for us to report a difference as being statistically significant, we have to be at least 95% certain that this difference is a genuine feature of the data and not due to random variation.
A two sided independent t test has been used to check for statistical significance and the null hypothesis has always been of no difference. This allows the t value to be calculated using the following formula:

where SE is the standard error for each estimate and
and
are our estimates for the two groups.
Statistical sampling theory suggests that a difference is significant at the 5% level if it is greater than or equal to 1.96.
Calculations of confidence intervals and statistical significance only take sampling variability into account. The survey's results could also be affected by non-response bias. If the characteristics of the pupils who participated differed markedly from those of pupils who were withdrawn or absent, there might be bias in the estimates. If that is the case, the SSA's results will not be representative of the whole population.
Without knowing the true values (for the population as a whole) of some quantities, we cannot be sure about the extent of any such biases in the SSA. However, comparison of SSA results with information from other sources suggest that they are broadly representative of the overall Scottish population, and therefore that any non-response biases are not large overall or are corrected by the weightings. However, such biases could be more significant for some sub-groups of the population or in certain Education Authority areas, particularly those with the highest non-response rates.