Assessment of Achievement Programme: Report of the Sixth AAP Survey of Science (2003)
Appendix B: Sampling, task distribution and attainment estimation
B.1 School and pupil sampling
The 2003 science survey was designed to assess the science and core skills attainment pupils at P3, P5, P7 and S2 in mainstream schools in Scotland - including educational authority, self-governing, grant-aided and independent schools. Special schools and Gaelic medium schools were excluded from the sampling frame.
Representative pupil samples were selected for testing using two-stage proportionate stratified sampling, with an overall sampling fraction of just over 5% of the pupil population. Separate school samples were drawn for the four pupil stages. Before school sampling began, the population of maintained schools was first stratified by education authority grouping, roll size and percentage free school meals entitlement. The 32 education authorities were classified into four groups for this purpose, based on their general population densities (see Table B.1). Schools were grouped into two size bands (primary schools: under 280 pupils on roll, and 280 or more on roll; secondary schools: under 150 S2 pupils on roll, and 150 or more S2 pupils on roll) and two bands for free school meals entitlement (primary schools: <10%, and 10% or more; secondary schools: <15%, and 15% or more). School size and free school meals entitlement classifications were based on the most recent school census data available at the time, viz. census data at September 2001 and at January 2002, respectively. Independent schools formed a separate national stratum.
Table B.1 Education authority groupings
(Based on general population density)
Group 1 | Group 2 | Group 3 | Group 4 |
Aberdeen City | East Dumbartonshire | Clackmannanshire | Aberdeenshire |
Dundee City | East Renfrewshire | East Ayrshire | Angus |
Edinburgh City | Falkirk | East Lothian | Argyll & Bute |
Glasgow City | Inverclyde | Fife | Dumfries & Galloway |
| North Lanarkshire | Midlothian | Eilean Siar |
| Renfrewshire | North Ayrshire | Highland |
| West Dumbartonshire | South Ayrshire | Moray |
| West Lothian | South Lanarkshire | Orkney Islands |
| | | Perth & Kinross |
| | | Scottish Borders |
| | | Shetland Islands |
| | | Stirling |
Schools were selected from within strata, without replacement, with probability proportional to stage size. At each stage around 200 schools were selected and invited to participate in the survey, and 70-80% agreed to do so (see Table B.2).
The sample pupils were selected in a second stage of sampling, from within those schools that had agreed to participate in the survey. Wherever possible, i.e. in those schools with sufficient numbers of pupils available in the stage concerned, 22 pupils were randomly selected within each survey school - 10 for involvement in the assessment of Knowledge and Understanding in science (and in the assessment of numeracy), six for involvement in the assessment of reading (and writing at P5, P7 and S2), and six reserve pupils, to act as substitutes for pupils absent on the assessment days. In composite classes, only pupils at the relevant stage were selected.
Table B.2 School participation
| P3 | P5 | P7 | S2 |
Schools invited to participate | 214 | 212 | 213 | 191 |
Schools agreeing to participate | 164 | 162 | 167 | 139 |
Schools returning completed test booklets | 150 | 156 | 156 | 130 |
% participation rate for science among invited schools | 70 | 74 | 73 | 68 |
Schools eligible for reading assessments* | 125 | 131 | 136 | 130 |
Schools returning completed test booklets | 99 | 93 | 114 | 130 |
% participation rate for reading among eligible agreeing schools | 79 | 71 | 84 | 100 |
*These were schools that had sufficient sample pupils to participate in both the science assessment and the reading assessment
Where schools had too few pupils available at the relevant stage to supply 10 for science assessment and six for reading assessment then science took priority. In other words, where schools had fewer than 16 pupils available in the relevant stage, 10 were identified at random for science assessment and the remainder, fewer than six, would then take reading tasks. Where schools had 10 or fewer pupils available at the relevant stage all of these were identified for involvement in science assessment, and none would do reading. There were thus some schools in the survey sample that would take part in science assessment only.
Where pupils with special educational needs were selected in school samples, these were included in the test sessions at the head teacher's discretion.
In a subset of the schools the 'science' pupils also took part in the assessment of practical investigation skills, while the 'reading' pupils took part in the assessment of ICT skills or participated in focus group discussions exploring their informed attitudes in science. Although the 'practical' schools were drawn from across the country, they were not selected entirely at random: two important criteria for involvement were (i) that the school should have sufficient pupils at the stage concerned to justify a day visit by two field officers, and (ii) that it should be within easy travelling distance of the field officers' home bases. In the event, just over half of the primary schools that agreed to participate in the survey were involved in the practical assessments (87 at P3, 85 at P5 and 94 at P7), as were around two-thirds of the secondary schools (90 schools at S2).
B.2 Task distribution and achieved sample sizes
Science and numeracy
In order to assess pupils' Knowledge and understanding in science and to report attainment in terms of the 5-14 levels, 360 different pencil and paper single-level tasks were administered in this survey. These comprised 60 tasks at each of Levels A to F, with 20 from each outcome (Earth and space, Energy and forces, Living things and the processes of life). Task administration followed a multiple matrix sampling strategy.
At each stage 10 different Knowledge and Understanding booklets were prepared for survey administration, by randomly allocating tasks to booklets to meet a given booklet specification. The ten booklets were paired into ten different booklet pairs, and booklet pairs were allocated randomly to the sample pupils in each school. Thus every 'science' pupil attempted, or was intended to attempt, two different test booklets, with every booklet eventually attempted by similar numbers of pupils in similarly representative pupil subsamples. In any one school at most two pupils would attempt the same booklet.
Since there was no information available beforehand about the likely level that each pupil was currently working at in science, it was not considered appropriate to create single-level test booklets and to place these in front of randomly selected pupils. Every booklet therefore contained tasks from at least two different levels: A and B at P3, B and C at P5, C, D and E at P7, D, E and F at S2. Every booklet also contained a balanced spread of tasks across the three outcomes. To facilitate attainment comparisons across stages, the Level B tasks which featured in a particular test booklet at P3, mixed at this stage with Level A tasks, were transferred into one of the test booklets at P5, to be mixed with Level C tasks, and so on. Within booklets, tasks were grouped by outcome, and within outcome blocks lower level tasks were presented before higher level tasks. A single numeracy task was placed in every booklet, at the end of one of the outcome blocks. Every booklet was printed in three different versions, simply by varying the order of presentation of outcome blocks, to minimise any possible fatigue effects on any particular tasks.
At P3 and P5, each booklet contained 12 different science tasks, two per level from each outcome, plus a numeracy task, and was expected to take 30-40 minutes to complete. At P7 and S2, each booklet contained 18 science tasks, again two per level from each outcome, plus a numeracy task, and was expected to take 50-60 minutes to complete.
Almost 6000 pupils in around 600 schools participated in the written assessment of science Knowledge and understanding: response data were analysed for 1405 P3 pupils, 1463 P5 pupils, 1483 P7 pupils and 1306 S2 pupils. At each stage these figures represent between 2% and 2_% of the pupil population. Each test booklet, and therefore every assessment task, was attempted by around 270 pupils at P3, by around 290 pupils at P5 and P7, and by around 250 pupils at S2.
Reading and writing
There were 15 reading tasks in total, three at each of Levels A to E. Each task comprised a text and associated test questions, and was expected to take the same time to complete as a science booklet at the stage concerned. At Levels C, D and E reading tasks were accompanied by associated writing tasks. An individual reading task, where relevant in company with its linked writing task, was presented to pupils as a single test booklet.
Again, a multiple matrix sampling scheme was employed to allocate tasks to pupils. The tasks to be administered at a particular stage were paired into six different pairs in such a way that every pair comprised tasks from two adjacent levels. Task pairs were then randomly allocated to the pupils in each school that had agreed to participate in the survey and that had pupils available for reading assessment. In this way every task would have been attempted by similar numbers of pupils across the survey, in similarly representative subsamples, and no more than two pupils would attempt the same task in any particular school.
In total, reading assessment data were analysed for 2564 pupils (586 at P3, 541 at P5, 665 at P7 and 772 at S2) and writing assessment data were analysed for a total of 1957 pupils (521 at P5, 680 at P7, 756 at S2). At each stage these figures represent 1% to 1_% of the pupil population. Each task was attempted by around 185 pupils at P3, 175 at P5, 215 at P7 and 245 at S2.
Investigative skills in science, ICT skills and informed attitudes
Nine investigation tasks were administered in the survey, along with six ICT tasks. In addition, the 148 field officers who carried out the practical assessments also animated a total of 647 focus group discussions with pupils. As usual, tasks and focus group participations were allocated to pupils at random.
In the majority of the schools that participated in assessment in this area, eight pupils were assessed for their science investigation skills and a further eight for ICT skills. Performance data were analysed for a total of 2635 pupils for science investigations and 2611 pupils for ICT: 609 and 615, respectively, at P3; 619 and 625, respectively, at P5; 710 and 697, respectively, at P7; 697 and 674, respectively, at S2. The number of pupils who undertook any particular task varied between 150 and 450, depending on the subject, stage and task. Among the 647 focus groups that were rated for informed attitudes, 80-90 groups discussed one or other of the two topics that featured at P5, P7 and S2, while 171 groups discussed the single topic that featured at P3.
B.3 Attainment estimation
In science and reading total scores were first computed for pupils, for each of their level-based 'tests': 12 tasks at a level in science, offering total maximum marks of 12-14 at P3 and P5 and 20-30 at P7 and S2, and one task at a level in reading, offering total marks of between 18 and 35 marks, depending on the level. Cut-off scores were then applied, and pupils classified into one or other of three attainment groups on the basis of these: 'basic skills', 'secure attainment' or 'considerable strengths' 11. The proportions of pupils classified into the three groups at relevant levels were calculated separately for every booklet pair in science and for every reading task, with the attainment data weighted appropriately to adjust for imbalances in sample representation caused by the non-participation of some schools. The resulting proportions were then simply averaged over pairs of science booklets (ten pairs per stage) or reading tasks (three per level) to produce the population attainment estimates reported in Chapters 2 and 4, respectively.
Margins of error for the attainment estimates arising from a single booklet pair in science would be a maximum of around six percentage points, reducing to a maximum of around two percentage points for the final averaged population estimates at a level. Margins of error for the attainment estimates deriving from a single reading task would be a maximum of around seven percentage points, reducing to a maximum of around four percentage points for the final population estimates at a level. It should be noted that these figures cannot take account of any measurement error that will have arisen from the possible incorrect classification of individual pupils, for some of whom the decisions made might have been different had the pupils concerned been assessed on a different day or on the same day with a different reading task or pair of science booklets (test reliabilities - alpha values - are typically in the range 0.7-0.8 for the 12-task science 'tests', and 0.7-0.9 for each reading task). Neither do they take account of the measurement error that will have arisen from the fact that the tasks used in this survey are merely representative of all the similar tasks that might have been developed and used in their place.
In the case of writing, practising teachers evaluated pupils' scripts and allocated level judgments. As always with extended writing, judgments of quality were subjective to some extent, as the inter-rater agreement study described in Chapter 4 confirms: the average inter-rater agreement rate when applying a 'best fit' evaluation scheme was 40%. With this in mind, the resulting writing attainment data have been presented in Chapter 4 as sample statistics only.
Given the nature of the practical assessment tasks - which were novel in nature and which did not lend themselves to pupil classification by level - no attempt has been made to produce weighted estimates of practical skills attainment on this occasion. School and pupil questionnaire findings are also presented in this report as sample statistics rather than formal population estimates.