SCOTTISH INDEX OF MULTIPLE DEPRIVATION 2004: SUMMARY TECHNICAL REPORT
Chapter 3: Methodology
The SID 2004 is based on the techniques used by SDRC in their work to produce the SID 2003 and the recommendations of the SCRSJ paper on ' Measuring Deprivation in Scotland : Developing a long-term strategy'.
The principles behind the SID 2004 are consistent with the work of Townsend and others in defining and measuring deprivation and build on the techniques used to produce the recent indices in England, Wales and Northern Ireland. The premise is that deprivation is a multi-dimensional concept where standards are defined in relation to social norms or expectations. It is therefore a relative concept rather than an absolute one.
The techniques used are described in the following sections:
- Indicators
- Domains
- SIMD 2004
Indicators
Source numerator and denominator data was sought from the data suppliers outlined in Appendix A who ensured that the data they provided was of sufficient quality to use in the index. The techniques employed to construct the indicators depended on the quality and availability of this raw data.
In the Current Income, Housing, Employment and Access domains, the data was sufficiently robust as to require no further statistical input. In the case of the Current Income, Housing and Employment domains, these indicators are therefore simple counts and in the Access domain the indicators are drive times.
However, it was necessary to apply the 'shrinkage technique' to the indicators in the Health and Education domains. This is in line with the SDRC methodology for the SID 2003 and is required to address the problem of small populations and unreliable scores.
Shrinkage is the term given to the procedure to improve the quality of the small area data by combining the information for a single Data Zone with information about the Local Authority. Where the numbers for a particular indicator are very small ie low birth weight they may fluctuate significantly from year to year. In order to avoid minor changes in the indicator resulting in significant changes in the assessment of deprivation, the figures for each data zone are combined with the local authority average.
This 'borrowing' of strength to improve the quality of the estimate at Data Zone level is important to avoid spurious assessments of deprivation.
Domains
As with the indicators, it was appropriate to use different techniques to create the domain scores depending on the types of indicators they contain. For the Current Income and Employment domains, scores were calculated as a simple rate based on the sum of the indicator counts and using the appropriate population as the denominator. This was possible because their indicators are non-overlapping and present the percentage of the population affected by this type of deprivation.
The Housing domain score was calculated by summing the indicator counts over the household population of the data zone. The indicators may overlap but it is felt that in this case, a person living in a household with both the attributes is more deprived than a person living in a household with only one.
For the remaining three domains it was necessary to rank the indicators and transform them to a standard normal distribution before combining them using weights generated by factor analysis.
The transformation and factor analysis was done for two reasons. Firstly, the individual indicators may be measured in different ways. For example, in the Education domain it would not be appropriate to simply add an average SQA score to the number of pupils who have not successfully applied to higher education because these are on different metrics. Secondly, factor analysis has been used to provide a method for calculating weights that enables the combining of the various indicators within a domain. The premise is that the deprivation within a domain such as Health is imperfectly measured by each of the individual indicators. However, by combining the indicator data across each data zone a score can be produced that represents the underlying deprivation. The transformation to a standard normal distribution also prevents outliers, possibly resulting from measurement error, having a disproportionate effect on the overall data zone scores.
In all cases, the factor analysis produced robust, single factor solutions explaining a large proportion of the variance.
SIMD 2004
Once the individual domain scores are calculated they are combined into the overall Scottish Index of Multiple Deprivation (SIMD 2004). In order to combine the domains, they are standardised to a uniform metric by ranking the domain scores. This is necessary because the domain scores are on different scales.
The next stage of the weighting together of the individual domains is to transform the standardised domain ranks. This transformation is necessary so that a desired degree of 'cancellation' can be introduced when combining the domains. This ensures that deprivation in one domain is not fully 'cancelled out' by lack of deprivation in another.
In line with the SDRC methodology, the exponential transformation of the ranks was chosen as the most appropriate method. This has the advantage that every domain is converted to an identical distribution with the same maximum and minimum values, while emphasising the most deprived 'tail' of the distribution.
The SIMD 2004 score is then constructed by combining the exponentially transformed domain rank of the domain scores using the ratios 6 : 6 : 3 : 3 : 2 : 1 in the following order.
- Current Income
- Employment
- Health
- Education, Skills and Training
- Geographic Access and Telecommunications
- Housing
The weights are those used in the SID 2003, adjusted to allow the inclusion of the new Housing domain. The work by SDRC concluded that Current Income and Employment domains should carry the most weight in the overall index. Partly due to the fact that these domains were the most robust and partly since this was in line with the academic literature of multiple deprivation. In the SIMD 2004 these conclusions are still relevant and therefore the weights have remained similar with a slight adjustment to incorporate the Housing domain. The relatively small weight of the Housing domain is a reflection of the current limited amount of relevant data available for inclusion in the domain. As more and better data on poor Housing conditions is developed the relative weight is likely to increase.
The larger the SIMD 2004 score the more deprived the data zone. However, in order to compare data zones it is important to use the relative order of the ranks. It is not correct for example to say that data zone X is twice as deprived as data zone Y because the SIMD score for X is 50 and that for Y is 25. This is due to the transformation of the data that takes place to enable a domain score to be produced. It is equally not true to say that a Data Zone of rank 50 is twice as deprived as a Data Zone with rank 100. However a Data Zone of rank 75 is more deprived than a Data Zone of rank 125.