5. Shrinkage
Alternatives to the original shrinkage method were evaluated by application to the raw indicator data, and, where computationally feasible, by application to the 1,000 simulated datasets. The following alternative methods were evaluated.
5.1. Original Method
The original shrinkage method is used as proposed for earlier versions of the SIMD5, where for data zone j in LAi, the indicator variable x ij is shrunk towards the LA average,
, by the formula

where
is the within-data zone variance ( i.e. sampling variance) of x ij, and
is the between-data zone variance of x ij within the ith LA.
In other words,
is a weighted mean of x ij and
, with weights relative to the reciprocal of the variance of x ij within- and between-data zones. In data zones where x ij is measured imprecisely,
is large, and x ij is therefore given little weight compared to
and the indicator is shrunk by a large amount. Where x ij is measured precisely,
is small, so that x ij is given relatively more weight and less shrinkage occurs.
For the majority of indicator variables, x ij is expressed as the logit of the observed rate in data zone ij, and
is the logit of the rate over the whole of the ithLA, rather than the mean of the x ij. The shrunken values,
, are transformed back to the original scale, though this is academic since they are ranked and transformed again before being included in a factor analysis. For the Pupil Performance at SQA Stage 4 indicator, x ij and
represent mean values, and
is the standard error of x ij, calculated from the within-data zone ( i.e. between-pupil) variance of SQA scores.
5.2. Alternative Shrinkage Units
5.2.1. No Shrinkage
Using no shrinkage can be viewed as an extreme form of shrinkage in which the unit for shrinkage is the data zone. Since the unit of analysis is the same as the unit of shrinkage, no shrinkage takes place.
5.2.2. National Shrinkage
One consideration of the original shrinkage method was that small pockets of deprivation within otherwise affluent areas could be masked by shrinkage, compared to similarly deprived pockets within areas with generally high levels of deprivation. Shrinkage towards a single higher-level unit, i.e. the national average, might avoid this.
5.2.3. Intermediate Geography
On the other hand, LA areas might be considered too coarse a geography to use for shrinkage, particularly since the index is often used to describe the share of deprived data zones across LAs or sub- LA areas. Also, the national shrinkage method might put too much weight on the national average rather than the data zone indicator value, thereby masking small pockets of deprivation. The IG provided by SNS was therefore used as an area definition for shrinkage. This geography is an aggregation of data zones within local authorities and each contains between 2,500 and 6,000 people.
5.2.4. Rurality Groups
An alternative to using geographical areas to define shrinkage units is to find some other classification that groups similar data zones together. For this purpose, the SE Urban/Rural Indicator was used. This classifies data zones according into the following groups:
1. Large Urban Areas - settlements of over 125,000 people (2,432 data zones)
2. Other Urban Areas - settlements of 10,000 to 125,000 people (1,892 data zones)
3. Accessible Small Towns - settlements of between 3,000 and 10,000 people and within 30 minutes drive of a settlement of 10,000 or more (666 data zones)
4. Remote Small Towns - settlements of between 3,000 and 10,000 people and with a drive time of over 30 minutes to a settlement of 10,000 or more (189 data zones)
5. Accessible Rural - settlements of less than 3,000 people and within 30 minutes drive of a settlement of 10,000 or more (930 data zones)
6. Remote Rural - settlements of less than 3,000 people and with a drive time of over 30 minutes to a settlement of 10,000 or more (396 data zones)
5.3. Multivariate Shrinkage
Multivariate shrinkage performs shrinkage on sets of variables, with the objective of improved estimation by using information about the correlation between variables within shrinkage areas. Within the SIMD 2004 algorithm, there is the potential to apply multivariate shrinkage methods to up to 5 sets of indicators:
- the 38 mortality variables contributing to the CMF indicator;
- the 36 illness variables contributing to the CIF indicator;
- the 5 remaining health domain indicators;
- the 15 variables contributing to the Working Age Adults with no Qualifications indicator;
- the 4 remaining education domain indicators.
Two methods for shrinking groups of indicators variables have been considered: the first is a multivariate extension of the univariate method currently used; the second applies a random effects model to groups of indicator variables to estimate components of variation and provide shrunken estimates.
These methods are much more computationally intensive than univariate shrinkage of each indicator in turn; consequently these methods can only be applied to the observed indicator data, and not the 1,000 simulated indicator datasets.
5.3.1. Multivariate Analogue to Univariate Method
Longford 13 presents a multivariate shrinkage estimator which combines information across outcome variables or subpopulations. This estimator is shown to give superior estimation of local area rates when compared to univariate shrinkage. Multivariate shrinkage performs best when area level means are highly correlated and one or more variables have small sampling and between-area variances.
Multivariate shrinkage is analogous of the univariate shrinkage used in the SIMD 20035. Assume that a group of indicator variables are being shrunk in each data zone toward the LA average. If we consider that in data zone j within LAi there is a vector of indicator variables, xij, with
representing the LA average, then the shrunken indicator vector can be defined* as

where Vij is the sampling covariance matrix of the xij, Dij = Vij + Si, where Si is the between-data zone covariance matrix of indicators within the ith LA, and I is the identity matrix. For indicators expressed as rates, the xij would represent logits, and the
could be transformed back to the original scale if desired.
* These results make the assumption that the data zone populations, as a fraction of the total LA population, are close enough to zero to be negligible, and that the sampling variance at LA level is zero. Though not explicitly stated, these assumptions are also made in the univariate case.
However, the indicator data are not collected with a view to estimating Vij. The diagonal elements of Vij can be estimated in the same way as for univariate shrinkage, by assuming a Binomial distribution for indicators expressed as rates, or from the within-data zone variance of SQA scores. The off-diagonal covariance terms are not estimable. For the purpose of the current project, these terms are assumed to be zero, and the potential of the method to give improved performance compared to univariate shrinkage rests in the use of Si to estimate between-data zone covariation.
5.3.2. Random Effects Models
In principle, multivariate shrinkage can be performed by fitting a random effects model to a set of indicator variables. To illustrate, this will be carried out using the 5 health domain indicator variables of alcohol-related discharges, drug-related discharges, emergency admissions, prescriptions for anxiety, depression or psychosis, and low birthweight singleton births.
Modeling will be carried out using the software package MLwiN12. A three-level model is fitted for indicator variables within data zones within LAs. A more stable model fit is achieved by treating each indicator variable as having a Poisson, rather than a Binomial distribution. The model can be specified as:

where y ijk is the value of the i th indicator variable in the j th data zone within the kth LA, n ijk denotes the corresponding denominator and ? ijk the Poisson rate;

where x 1ijk is a dummy variable for the 1 st indicator, x 2ijk a dummy for the 2 nd indicator, and so on;

where, for the i th indicator variable, ? i is the expected log rate over the population, v ik is the random effect of the k thLA and u ijk is the random effect of the jth data zone. The ujk and vk are specified to have multivariate Normal distributions with zero mean and covariance matrices that can be estimated:
.
The predicted values for each indicator variable,
, are then extracted and form the shrunken estimates.
5.4. Spatial Shrinkage
The random effects model described previously assumes variability between data zones and between higher-level units, with the result of estimating these random components being the production of shrunken estimates of the average value in each data zone. Spatial shrinkage performs a similar function, but instead of shrinking the observed indicator values in each data zone to the average value over some fixed set of data zones, shrinkage is carried out by combining the observed value with the average value amongst data zones that are geographically close to that data zone. In the example to be shown here, "geographically close" is defined to mean those data zones that border onto the chosen data zone.
Under the framework of Besag, York and Mollié 16, considering an indicator variable, xi, representing the logit of an observed rate in the i th data zone, the statistical model can be represented as

where µ is the average indicator level nationally (on the logit scale), and b i and h i are unobserved quantities that represent the random variation between data zones. The h i are unstructured, representing extra-Binomial variation, but the b i, if they were known, would show spatial structure, with values in nearby data zones in general more alike than in distant data zones.
To this end, the h i are assumed to follow a
distribution; the b i also follow a Normal distribution, with variance
and a mean value equal to
, where
indexes those data zones bordering on the ith zone.
This model can be fitted within a Bayesian framework; it is specified quite simply using the WinBUGS software 11, but the computational time can be high, and as such could be applied only to the observed indicator data, not the simulated datasets, within the timeframe available. The process involves repeated sampling from the conditional distribution of each model parameter in turn, including each b i and h i. Given enough time, the average values of all parameters will converge to their posterior means, but care must be taken to ensure that convergence has occurred. The method does, however, allow otherwise intractable models such as the spatial model described here to be fitted.