National Health Demonstration Projects - Evaluation Task Group Review October - December 2003: Final Report

Listen

Section 3: Evaluation

In this section we first consider some general issues regarding the relationship between policy making and evaluation (3.1) then go on to describe the approaches adopted by the external evaluations of the demonstration projects (3.2), the internal monitoring and evaluation work (3.3) and finally consider the relationship between the two (3.4).

3.1 Approaches to evaluation

Evaluation is a form of applied research concerned with assessing the results, impacts and/or outcomes achieved by some form of intervention (whether this be a project, a programme, an institution or a policy) in order to inform judgements about that intervention. While there are many different types of evaluation and methodologies, essentially there are two broad approaches:

  1. Evaluations that are concerned with proving effectiveness; they are concerned with the achievement of aims/objectives and impact/outcomes and with explaining success/failure; these are analysis-oriented and framed within agendas concerned with accountability and knowledge-building
  2. Evaluations that are concerned with improving the implementation of a programme or policy, or strengthening institutions, communities or networks; these are action-oriented and often framed within agendas concerned with development and empowerment.

(Chelimsky, 2001; Stern, 2004)

Combining these two approaches within a single evaluation can lead to tensions since not only will the basic purpose of the evaluation differ, but there are also differences in the epistemological and methodological approaches adopted, and the relationships between the evaluators and those involved in programme implementation (independent observation and analysis versus active engagement).

One major consideration in the selection of appropriate evaluation approaches and designs is the stage of the programme or policy cycle - planning, development, implementation. The alignment of the evaluation focus with the stage in the policy/programme cycle is formalised within many evaluation frameworks. For example, the European Commission differentiates between ex ante evaluation (sometimes called 'appraisal') and impact assessment, which are undertaken at the development stage, and interim and ex post evaluation, which are undertaken once the intervention has been implemented (European Commission, 2003). Within the current UK (evidence-based) policy context, and certainly within public health/health improvement, there is also an emphasis on the use of systematic reviews of results from evaluation studies to inform the planning stage in the programme/policy cycle and to ensure that proposals are evidence-based. This is formalised within the HEBS (now NHS Health Scotland) evaluation framework for health promotion ( HEBS, 1999; Wimbush & Watson, 2000).

Implementing this kind of systematic, staged approach to evaluation can be difficult. Many government programmes are funded with the expectation of immediate implementation and delivery within a fixed, e.g. three year period, with little allowance for an initial developmental/design phase. One implication is that evaluations of programme effectiveness are commissioned at too early a stage. There is seldom consideration of the need for an initial phase of formative evaluation while the programme is being developed and set up so that interventions can be piloted for their feasibility, acceptability and likely effectiveness within a particular local context, and thereafter adjusted to improve their chances of effectiveness. This exactly describes the situation in which the DPs, and their evaluations, developed.

3.2 External evaluation

Among the bids for external evaluation, two contrasting models were proposed. One of these, stemming from the clinical trial and health services research tradition, employed an objective, 'hands-off' approach to evaluate the outcomes from HR, while the other employed a much more collaborative developmental and 'hands-on' approach, using Theories of Change (ToC), to first clarify the objectives of the SW and HaHP projects and then monitor progress towards these objectives. All evaluations used a quasi-experimental research design as one component.

The adoption of contrasting approaches to evaluation permits an assessment of the strengths and weaknesses of each approach and some of the problems from the viewpoints of both the DPs and the evaluation teams in adopting one model rather than another.

3.2.1 Healthy Respect: evaluation design

In preparing their evaluation of HR, the external team ( HR/E) included the top line aims and objectives of HR to improve sexual health for young people in Lothian, each generating a number of 'pre-specified' hypotheses. The evaluation focussed on (a) sexual health outcomes of young people in Lothian by reference both to routine sexual health data (e.g. teenage conception and termination rates) and survey data on secondary school pupils' sexual health knowledge, behaviour and uptake of services (b) the organisation and performance of interagency partners in the provision of sexual health services and (c) the implementation and process of each HR component project. The evaluation would test the following hypotheses relating to the first two objectives (a) HR would impact on attitudes and behavioural change (better communication with parents/teachers on sexual health issues, reduction in proportion having underage sex, and increased knowledge and (reported) use of condoms), service access, acceptability and uptake; reduce conception/abortion rates and increase rates of Chlamydia testing (b) HR would increase interaction and networking between service providers to the perceived benefit of clients. The third objective (c) focussed on the processes by which these changes were hypothesised to occur.

To address these objectives/hypotheses, HR/E proposed the following evaluation to be conducted over 4 years from November 2000 to November 2004: (a) a quasi-experimental 'before-after design' comparing young people in the intervention area (Lothian) with a comparison area (Grampian), selected both for practical reasons and to represent another East coast region (thus avoiding west of Scotland cultural and religious complications), to be supplemented by qualitative data derived from focus groups. In the case of school surveys, representative samples of S3/S4 pupils in each region would be selected and power calculations (based on responses to the Scottish WHOHBSC [Todd et al., 1999]) indicated that around 2000 boys and 2000 girls were required in both intervention and comparison areas to demonstrate an effect (e.g. on underage sex); (b) a mapping exercise of sexual health and related services combined with interviews with agency personnel. To address objective (c), HR/E proposed qualitative (e.g. interviews and focus groups with clients and service providers) methods, each designed to identify process measures and implementation of the individual projects. HR/E were clear from the outset that with multiple concurrent initiatives, it would not be possible to attribute an intervention effect on population-level outcomes to individual components of HR.

3.2.2 Healthy Respect: a 'hands-off' approach to evaluation

The underlying model of the HR evaluation was therefore based on the assumption of an intervention with fixed overall HR aims and objectives, from which specific hypotheses could be formulated and which were testable by reference to a design combining a quasi-experimental method to identify an intervention effect on specified outcomes, with various qualitative methods used to illuminate processes which might bring this about. In this model, the independence of the evaluators is imperative with contact between the evaluators and implementers kept to a minimum for data collection purposes. The evaluation team do not seek to influence the direction or development of the intervention, nor is the evaluation open to influence by the project team. While consensual and collaborative working with the HR team is necessary to obtain data, the relationship is otherwise non-interactive. In the revised proposal (August 2000), HR/E specifically identified that one of their goals was to 'avoid contamination of HR' (p.6). Thus, HR/E have not fed back interim results to HR on the grounds that this might influence the direction of the project. An alternative approach such as theory of change was described as being outwith the approach to HR evaluation.

The adoption of the 'hands-off' approach to evaluation used by HR/E seeks to provide an unbiased test of the initial hypotheses. However, there are potential disadvantages:

  • It assumes that the demonstration and component projects' objectives are stable. Any change in objectives, or failure to implement them, makes the project less evaluable. Although identification of such changes is one of the reasons for the qualitative process evaluation and provides evidence about the extent to which the intervention was implemented according to plan, the shifting emphasis of HR towards process rather than outcome measures was identified as a major problem for HR/E. It was perceived as a departure from their original understanding of HR's preparedness to be tested for impact and effectiveness. (4.11.03).
  • It assumes objectives are clearly articulated and communicated. This was not the case in the early stages of HR, both the project objectives and management continuity being unstable for at least 8 months from August 2000. Against this background, HR/E drew up a Memorandum of Agreement to agree objectives and mechanisms for process evaluation, roles and responsibilities and respective intellectual property right areas. The Memorandum was consistent with the commissioned evaluation (August 2000) in not offering interim feedback
  • The lack of feedback from the evaluation to HR was experienced as frustrating by project staff, leaving them feeling they lacked guidance and any sense of whether or not they were achieving their objectives.
  • The 'hands-off' model of evaluation also runs the risk of generating a gap in expectations between internal and external evaluation teams. In the early stages of HR, the perception that HR/E had not given sufficient attention to self-esteem led the project manager to develop a separate research proposal to address the issue, which was not subsequently funded. There were many other examples of HR component projects undertaking research (often of dubious quality) to fit in with a culture of evaluation internal to HR. One consequence of this is that internal and external evaluations may produce disparate process findings.

3.2.3 Healthy Respect: assessment

The external evaluation involves several components including health outcomes of teenage pregnancy, compliance with national recommendations for the detection and management of Chlamydia in the context of the National Chlamydia SIGN Guideline Audit and service provision for STIs. Central to it is a repeat cross-sectional survey of sexual health knowledge, attitudes and (reported) behaviours among S3/4 secondary pupils in intervention and control schools (Lothian and Grampian respectively). The first survey, involving 10 Lothian (2760 pupils -80%) and 5 Grampian (1501 pupils - 83%) schools, was completed between September and December 2001, the second in 9/10 Lothian and all 5 Grampian schools completed during the same period in 2003.

In principle, this design should be able to address the key hypotheses of the evaluation even allowing for the risk of contamination between areas, and the difficulty of causal attribution of effects to the intervention given the multi-faceted nature of HR. The surveys appear to have been conducted efficiently with good response rates within schools (although absentees were not followed up). However, in several respects the design was less than optimal:

  • The original HR/E objective (August 2000) to compare representative samples of schools (and pupils) was not achieved, and in the case of Lothian was not achievable because HR was only operating in selected Lothian schools. The ten intervention schools that 'signed up' to HR were self-selected, i.e. were 'volunteers'. They may represent those most committed to sexual health education, potentially biasing estimates of the intervention effect.
  • The original aim for the control sample was to identify ten schools in Grampian matched by size, rurality and level of deprivation. In the event, only five of the 17 selected and invited secondary schools in Grampian, agreed to take part. These schools may not be representative of Grampian schools, nor were they well-matched controls for Lothian schools. This may compensate for the bias in the Lothian sample, but the degree of under or over compensation is impossible to estimate, making interpretation of results difficult and generalisation risky. The potential selection bias and hence representativeness of the schools is an issue that will be addressed explicitly in the final report.
  • The reduction in sample size caused by school recruitment problems necessitated a re-assessment of the power of the sample to detect specified effect sizes. The original estimates (above) were 2000 pupils of each sex in both intervention and control areas, the response to the first survey indicating this was not achieved, especially in Grampian. The Progress Report (October 2002) notes, however, that the achieved sample sizes of the first survey based on the revised sample estimate (using a new 2:1 Lothian/Grampian ratio), conducted before the first survey, 'continued to allow detection of an effect size of 4% to 5%' (p.3). This was based on the revised power calculation agreed with the CSO, assuming a 4-5% difference in prevalence of an outcome such as reported experience of sexual intercourse given a baseline prevalence of approximately 20%.

We do not yet have any results from HR/E on the school-based intervention. It seems likely, however, that the problems of school recruitment may make these results difficult to interpret. If there is no difference between Lothian and Grampian pupils (controlling for confounders), it could be attributable to a number of factors. If there is, it may not be generalisable to a wider population because of potential biases introduced by the schools selected.

3.2.4 Healthy Respect: process evaluation

Process evaluation was a limited part of the original research proposal from HR/E, but was extended in the revised version at the request of the Scottish Executive. It includes:

  • Assessing the effectiveness of interagency working using descriptive before-and-after inventories and mapping of service provision, partnerships and networks and through observation of professionals' contacts and activities through diary keeping.
  • Describing the implementation process of HR's 12 component projects by identifying key process indicators of implementation for each of the projects. The views of projects' clients and providers are sought to identify best practice, perceived impact and acceptability.

The process evaluation was intended to be used to identify, understand and interpret any observed changes in outcome measures, making associations between process and outcome where strict attribution is not possible. While the process evaluation and context mapping helped keep the evaluation team up to date with the evolution of the demonstration project, the team sought to maintain their independent position by avoiding feedback from the process evaluation.

So far, at the time of writing, there have been no reports from the process evaluation of HR. The contribution of this element of the Healthy Respect evaluation to understanding causal attribution is likely to be weak without an overall programme theory to make the links between goals, individual project activities and outcomes. The process evaluation will only aid in assessing the effectiveness of each component project in terms of their own objectives, against a set of 7 criteria derived from literature on good practice on health promotion.

3.2.5 Starting Well: evaluation design

3.2.5.1 Theory of change - a 'hands-on' approach to evaluation

In the (revised) bid to evaluate SW, the external team ( SW/E) made a distinction between the criteria and methodology used to evaluate an intervention trial and those appropriate for a 'demonstration project', the rationale for the latter being as much about 'improving the intervention as proving that it works' (p.7). The idea that the evaluation should shape the direction of the intervention contrasts with the approach described above. The rationale for this more interactive approach is a recognition that 'real-life' interventions rarely stand still and often depart from their initial objectives, as a consequence of external events, such as policy changes or service reorganisation, or internal changes in direction initiated by those implementing the intervention. Indeed, initial objectives themselves may not be clear. From this perspective, involvement of the evaluator in the development and course of the DP is desirable since it provides a means of identifying what is being evaluated. The method proposed to address these issues in the SW (and HAHP - see below) evaluations was the 'theory of change' (ToC) (Fullbright-Anderson, 1998; Judge & Bauld, 2001).

ToC seeks a better understanding of the processes in an intervention that might produce predicted change. The first step is to identify the connections made by key stakeholders between DP inputs and desired outcomes to make an assessment of the likelihood that the goals can be achieved. Thus, in this first stage, through interaction between SW and SW/E (interviews/focus groups with steering group members), the aim was to clarify objectives. After the initial stage, the focus of ToC switches to the documentation of processes designed to assess whether intended actions take place and whether predicted changes are observed.

3.2.5.2 Process Evaluation

In the case of SW/E, the process evaluation identified three key issues that were an integral part of SW's TOC: the extent to which intensive home visiting led to the development of therapeutic alliances between families and their home visitors; the implementation issues involved in developing a skill mix approach to home visiting; and the degree to which intensive home visiting at an individual family level led to improved community and strategic responses to child and family health problems. This took the form of detailed case studies with 59 SW families and associated Health Visitors ( HVs) and lay support workers in order to evaluate the extent to which specified components of the home intervention (e.g. the Family Health Plan) were being achieved. It also involved detailed documentation of what actually happened during the implementation of SW in respect of all its components. In a later document (Shute & Judge, in press), these were reduced to 3 central components (a) case studies (as before) (b) formation and development of staff team of professionals and paraprofessionals (c) influence of identified health needs on local and higher-level planning. While these methods and measures would normally be part of the process evaluation in any intervention, the 'theory of change' involves the systematic documentation of change in intervention components as they relate to intended outcomes. It is claimed that this facilitates a more sensitive analysis of 'causal' attribution than is often the case, enabling better identification of both the reasons for intervention success and failure.

3.2.5.3 Quasi-experimental study

The use of ToC, however, was intended to complement rather than replace a traditional quasi-experimental approach. SW/E proposed a comparison (not unlike that of HR/E) of two cohorts, one in the SW (intervention) area, the other in a comparison area with broadly comparable demographic and socio-economic characteristics. The SW/E cohort is a time-specified subset of all SW participants, defined in this (and the comparison cohort) as all babies born between June 2001 and June 2002 . It was initially proposed to recruit families via HVs at the point at which they first made contact (assumed to be in the antenatal period) and follow them up for a period of 30 months (contacts at birth, 6 weeks, 6 months, 18 months and 30 months). However the latter follow up was abandoned. Thus, in addition to implementing the SW programme, both recruitment to and administration of SW/E instruments would be conducted by HVs, supplemented by SW/E interviewers in the comparison area. In general, this was what happened, recruitment and some instruments (e.g. postal questionnaires) being administered by HVs, trained research nurses additionally conducting home interviews at 6 and 18 months, including the administration of the HOME measure.

A range of outcome measures was proposed for each point of contact including maternal and family characteristics (antenatal), birthweight/gestation (birth), postnatal depression (6 weeks), maternal diet, breastfeeding, immunisations, HOME score (6 and 18 months) etc. Reflecting the emphasis on parenting as a key outcome, the HOME score (with 6 sub-scores) was seen as the core outcome measure. The projected number of families in the SW area was 600-700 per year (1500-2000 over 3 years). Two options were given for the comparison area involving (a) a one year or (b) three year cohort sample respectively. Power calculations were given in relation to predicted effect sizes on HOME and child accidents, the latter being used to demonstrate that in option (a) the sample size would not be adequate.

3.2.6 Starting Well: assessment

3.2.6.1 Theory of Change

ToC is an interesting development in evaluation methodology and seems particularly appropriate when the initial objectives of a project are unclear. However, the following points are worth noting:

  • ToC has as one of its overall aims the improvement of interventions. It does not start with objectives as defined by a project (as with HR/E) but seeks via interaction with DPs to clarify objectives and possibly redefine them. This is a departure from the usual scientific principle of independence and might in various ways compromise the implementation of the DP. For example, DPs might become over-dependent on evaluators for direction and guidance. This is unlikely to be a problem in the formative stages of a project when objectives are being formulated, but if objectives (and related processes) continue to change in a 'mature' project, it becomes more difficult to know what is being evaluated. ToC would, however, facilitate an understanding of how and why such change occurred (e.g. the DP became committed to a new approach), but what then is the 'project' that is being evaluated overall?
  • Inasmuch as ToC leads to a change in objectives, there is a risk that the DP shifts away from the evidence base it rested on in the first place, thereby inadvertently reducing the likelihood of obtaining an intervention effect
  • Any change in project objectives resulting from ToC that occurred after an evaluation (e.g. before/after survey) had been commenced would potentially render that evaluation flawed. This highlights the fact that ToC is of particular value in the formative stages of a project and should not intervene after a project is up and running. Beyond the initial stages, ToC functions much like any other process measure (a point made by HR/E).
  • ToC may place demands on DPs to become involved in procedures they may not have expected and might not want. The view from HaHP would have been that this was another set of controls while SW did not regard the ToC feedback as particularly useful.

Since both SW and HaHP process evaluations used ToC methods, a combined discussion of how well ToC worked in practice is included in section 3.2.8 below.

3.2.6.2 Quasi-experimental study

As in the case of HR/E, the choice of a quasi-experimental design was an appropriate methodology to test for differences in outcomes between intervention and comparison areas. There were some modifications to the design resulting both from delays in implementing SW and from the practicalities of conducting all the proposed follow-ups within the designated timeframe. Thus, the number of SW/E contacts has been limited to three points (10-14 days after birth, 6 months and 18 months) and the availability of data has been limited by the way SW was itself implemented (most notably in the lack of contact with families in the ante-natal period). However, while these problems restrict the capacity of the quasi-experimental design to deliver they do not invalidate it. The extent to which it has delivered depends on the following considerations:

  • Recruitment to the survey was more problematic than anticipated. Although 98% of all families participated in SW, HVs were less successful in recruiting them to SW/E. In the SW areas, of 604 births in the specified time period, 375 (62.5%) mothers agreed to take part; in the comparison area, of (an estimated) 600 births, 262 (43.7%) consented to participate (total n=627), possibly reflecting a lower level of commitment among non- SWHVs. Consequently, recruitment of controls was later extended to health visiting teams in the West of Glasgow. It is not clear how this affected the representativeness of the control sample.
  • Response rates to each of the SW/E contacts have been lower than expected. In the first (baseline) contact, involving a postal questionnaire, data were only available on 447/637 (70%) of cases (71% SW area, 69% comparison area); at six months (for interview) the comparable figure was better 493/637 (77%) of cases (80% SW, 73% comparison). However, an analysis of six month outcomes (Shute and Judge, in press) was based on all those with baseline and six month data, reducing the sample to 359/637 (57%) or 30% of the population of families in both areas. The number of cases available for analysis ( SW 213, Comparison, 146), was below the earlier estimate of the numbers (220 families per area) needed to detect an intervention effect, raising the possibility that relevant effects would fail to be detected.
  • These problems raise the question whether each sample is representative of its respective population, previously regarded as 'a prerequisite for this evaluation aim' (First Annual report, p.5). This issue was not addressed in the paper analysing six month outcomes (Shute et al., submitted) where both areas were reported to have similar proportions of lone mothers and non home-owners to the Glasgow population. This does suggest the possibility that the SW/E samples were biased towards less deprived families in both the SW and comparison areas. The issue was highlighted in the 18 month progress report (e.g. within area comparisons of opt-ins and opt-outs) and should also be addressed in the final report.
  • It was also evident from the six month data that outcomes for ethnic minorities differ (higher maternal depression, poorer HOME scores), which raises the question of the applicability of the SW intervention to this particular sub-group or the validity of these instruments in a transcultural context. Although this can be controlled for in multivariate analysis, a sensitivity analysis to determine how it impacts on the overall intervention effect would be useful.
  • Throughout its development, SW has evolved, to the extent that it is no longer regarded as a project so much as an approach (Ross & de Caestecker, submitted). This has, as acknowledged, made it extremely difficult to evaluate since families have not been exposed to a constant intervention but rather to different types and levels of intervention. Thus, in the preliminary phase, the early families may not have received the full intervention, in later phases the intervention may have become diluted, the result of which is that without taking this into account it may not be possible to detect an effect of SW at its most optimal. This issue was alluded to by Shute & Judge who used the concept of the 'mature' DP to indicate the point at which a project is fully up and running and to some extent constant.

In summary, although SW/E have conducted this part of the evaluation to a high standard, problems of recruitment have limited its capacity to deliver. It is not yet clear how representative the samples are of their respective populations, nor whether the numbers are adequate to demonstrate an intervention effect on parenting. In retrospect, it may have been better to seek consent from mothers in the ante-natal period (with additional data collection benefits) and to collect baseline data via interview rather than postal questionnaire. It is also possible that the full effect of SW has been obscured by changes in the DP over time which might be revealed via identification, and related analysis, of a 'mature' phase.

3.2.7 HaHP: evaluation design

The external evaluation of HaHP involves 4 separate but linked approaches that are intended to give a balanced perspective on the overall processes, impacts and outcomes of the demonstration project.

These are:

  • Theory of Change
  • A quasi-experimental survey
  • Contextual analysis
  • A range of interrelated studies of key settings and organisations (the community, primary care and the local authority)

For a variety of reasons, planned and unplanned, both the HaHP project itself and the detailed format of the evaluation have developed and altered over the period of the project. For example, the interrelated studies were introduced to strengthen the evaluation once it became clear that problems with the surveys would limit their usefulness. The integrated case studies focused on two settings (primary care and community) and one organisation (local authority), and looked at the extent of service development and the impact of HaHP on professionals and/or agenda change, at both strategic and operational levels.

Many of the general points made in relation to SW/E also apply to the evaluation of HaHP ( HaHP/E). Without the final report we cannot judge the success of the overall approach. Here we consider issues related to the quasi-experimental surveys.

3.2.7.1 Quasi-experimental study

This component of the evaluation aimed to assess the impact of the overall intervention in Paisley. A comparison area, Inverclyde, was identified, with similar population characteristics and geographically adjacent to the study area.

Randomly selected adults aged 20-70, within age and sex quotas, were to be assessed for CHD risk factors and health related behaviours at the beginning and reassessed at the end of the intervention period in both Paisley and Inverclyde. The assessments consisted of a questionnaire and attendance at a nurse led clinical assessment.

However, despite major efforts by the evaluators, the response rate for the baseline survey was a disappointing 28% in Paisley and 27% in Inverclyde. Changes in data protection regulations and inaccuracies in the addresses on the community health index caused particular difficulties which could not necessarily have been anticipated. The low response reflects recent experience in heath and lifestyle surveys throughout Scotland where many areas have suffered falling levels of response in the past few years.

The poor response rate and an over-representation of older people and less deprived areas meant that the survey population was not representative of the Paisley and Inverclyde populations. Several options to tackle this problem were considered and the evaluation proposal was revised as a result. The revised approach aims to use secondary data, including monitoring information from within the project.

This is likely to cause different problems and it may prove difficult to get good quality comprehensive data on trends in risk factors and health related behaviour in the general population and in various sub groups such as young people. This has obvious implications for HaHP as a national demonstration project for CHD prevention where any impact of the interventions on mortality would not be seen for some years and some effect on intermediate measures such as risk factors would be expected.

The design of this part of the evaluation was problematic from the start, with most other community based CHD prevention studies being unable to demonstrate an attributable reduction in risk factors in the study areas compared to control areas. In summary, the complex nature of HaHP, the unrealistic timescales for planning and several unforeseen constraints have led to real problems both for the implementation and the evaluation.

3.2.8 Theory of Change in practice

The process evaluation element of the external evaluations of Starting Well and Have a Heart Paisley consisted of two distinct strands:

  • A Theory of Change strand which attempted to explore, surface and develop the 'programme logic' (i.e. the logical connections between the programme's aims, the programme's activities and the intended outcomes) through interviews with key strategic and operational staff and the analysis of key documents
  • A formative evaluation element that examined and described the implementation process of key elements of the DPs using qualitative research methods

Both these elements were intended to provide feedback and learning for the DPs themselves and for the wider policy and practice communities. Interim reports were produced in 2003 based on the early findings from these process evaluations elements (Mackenzie, 2003; Blamey, 2001 and 2003). The contribution of the ToC approach is said to be threefold (Weiss, 1995):

  • Sharpening planning and implementation
  • Facilitating the development of an evaluation framework
  • Reducing problems associated with causal attribution

The extent to which these objectives were realised within HaHP and Starting Well was explicitly addressed in a paper (Mackenzie, M. and Blamey, A , 2004).

a) Sharpening planning and implementation. From the perspective of the HaHP and SW demonstration projects, ToC was 'a dominant element' of the external evaluation but it had mixed results. It was seen as an interesting and helpful tool in terms of project development and providing formative feedback to project management, giving them a mandate to make proposals for future developments. On the other hand it was also seen as burdensome and the feedback was sometimes too late to inform decisions. An alternative view was expressed by HaHP/E indicating that the early work in HaHP (Blamey 2001) made clear comment on the lack of evidence-base practice in key areas of HaHP as well as the over ambitious nature of the plans, yet these issues were not fully addressed by the project.

The ToC approach proved particularly useful for HaHP because of the major difficulties the DP faced in its early stages. The initial plans for HaHP were overly ambitious, and, as a result, timescales for delivery were lengthened and expectations of outputs and outcomes were reduced. There was a lack of initial planning time for such a complex initiative as well as operational problems such as staff recruitment and retention. (Blamey 2003). As a result, the theory based approach was very well received by the HaHP project team, who appreciated the support and direction it enabled the team to develop.

For SW, the process of reflection involved was seen as the most useful aspect of the ToC approach. For both SW and HaHP, the main shortcoming was that it was a tool more appropriate to use at the planning stage, before the project started, rather than while the project was in development and maturing. From the perspective of the external evaluation teams, ToC was a helpful tool for surfacing conflicts in project goals, priorities and approaches among key stakeholders, but offered nothing in resolving these.

b) Facilitating the development of an evaluation framework. The ToC was intended to provide a framework of expectations for the evaluation to test. The value of ToC before the start of the projects is to clarify project components, linking them to projected outcomes and thus clarifying the fit between overall project objectives and overall project 'package'. From the projects' perspective, ToC can and did play a role in project design/clarification of projects elements/package as a whole, and guiding of evaluation. However, the implied sequential process was not apparent; it remained stuck in the project design phase rather than informing the evaluation focus. The main barrier for the projects to making the process work optimally was timing, it being desirable to apply ToC at design stage rather than once a project is up and running. From the evaluators' perspective, on the other hand, it was seen as useful in identifying key questions for the internal evaluation and to help the external evaluation team to further specify evaluation questions and prioritise the focus of the evaluation (eg roles within the workforce).

c) Attribution - understanding what brought about the observed effects

In order to fully realise the potential of the ToC approach in helping to unravel problems of causal attribution, it has to be carried out in a very intensive way to create a well specified, detailed theory of change. This was not possible to achieve in the context of SW and HaHP which are both complex multi-stranded interventions, developed and implemented in a fast-moving climate where detailed, highly specified planning does not exist and large areas of the projects are not grounded within an evidence base. A further reservation was the linearity of causal effect implied by the ToC approach, whereas in complex systems the synergistic effect of interaction between the individual components needs to be allowed for - a project may be ineffective on its own but be effective in the wider context of the overall programme.

3.3 Internal Monitoring and Evaluation

It was widely acknowledged by the DP teams that the establishment of an effective internal monitoring programme was given insufficient priority within the early stages of developing and implementing the projects. This led to delays in establishing proper performance monitoring systems, compounded by difficulties in recruiting evaluation staff with the appropriate skills and experience.

3.3.1 Healthy Respect

After a difficult beginning with several project management changes, it was only once the current project manager was in post that an internal evaluation function was developed. Standard project management methods have been adopted with an observational, process-focused approach. This involves using pro-formas to collect quarterly 'audit' information from all the constituent projects and providing feedback reports to projects for discussion. The data are analysed by the NHS Board's Health Information Unit and have been used to inform decision-making about continued funding of the projects. The continual demand for monitoring data was seen as imposing a heavy burden on the smaller sub-projects. Schools have been reluctant to provide information on SHARE training coverage due to lack of capacity. Some additional research (eg media evaluation) has been commissioned by the internal evaluation officer.

After initial difficulties, the relationships between Healthy Respect's internal team and HR/E were described as 'good' in terms of collaboration and information sharing. The Memorandum of Agreement between HR and HR/E was seen as very important in delineating the respective roles of internal and external evaluation teams and avoiding duplication. The external evaluation team drafted a pro forma for the individual projects to use for their quarterly 'audit' returns (in lieu of diaries), but the external team have been reluctant to get involved in the projects and interim analyses of their survey data were strictly proscribed.

3.3.2 Starting Well

At the outset, there was little appreciation of the need for and role of the internal evaluation function. It was not until March 2002 (2 years from the start) that Starting Well was able to recruit someone with the appropriate skills for the internal evaluation post. The internal evaluation has produced a monthly management report using Family Health Plan data that is captured and collated centrally on a database in each area. Feeding the information back to the teams has served as a focus for team discussions. There has been an increasing demand for this information and ad hoc analyses have been conducted. The external evaluation team uses this internal monitoring data for the families in their cohort. Issues relating to data protection had to be sorted before data sharing was possible. Other research has also been commissioned by the internal evaluation officer once she was in post, such as action research on nursing practice, the use of practice guidelines and further work on community development.

3.3.3 Have A Heart Paisley

Similarly, the internal evaluation was severely delayed with HaHP. The project had expected the external evaluation to fulfil all requirements. Thus, the internal evaluation post was created at a junior officer level. Problems mounted due to limited baseline data collected at the start of the project, a low response rate to the baseline survey, problems in recruiting and retaining the internal evaluation post holder and a realisation that the external evaluation role did not encompass internal monitoring and evaluation functions. As the project evolved the interface between internal and external evaluation became more blurred, and a more integrated approach developed. Relationships between the HaHP project and the external evaluation team were described as good in terms of useful feedback and a constant presence at management team meetings.

In addition, in HaHP, some key aspects of the project were not covered by the external evaluation and struggled to get funded evaluation programmes in place. Examples of this were the cardiac rehabilitation programme and the development of a CHD disease register which was intended to provide the basis of a systematic approach to secondary prevention. HaHP submitted unsuccessful bids to the CSO and to the British Heart Foundation for funding for these evaluations. Subsequently, they developed an ambitious internal evaluation that is currently underway.

It is likely that the effectiveness of the cardiac rehabilitation programme and of the disease register in ensuring the systematic implementation of evidence-based interventions will be of particular interest to the NHS throughout Scotland. In retrospect, the effective evaluation of this key element of HaHP should have been given a higher priority from the start and resourced adequately.

3.4 Relationship between internal and external evaluations

The experience of the three internal evaluations highlights the following 3 issues:

Evaluation and implementation role

At the outset, the focus of the DPs was defined as implementation/action and disseminating good practice; research and evaluation was not seen as part of these roles and there were expectations that the external evaluation team would do the research/evaluation. In particular, the requirements for internal monitoring and evaluation were unclear at the outset and lacked designated and clear strategic leadership.

The relationship between internal and external evaluation

There was a lack of clarity at the outset about the need for, and the nature and level of the internal evaluation function on the part of the DPs and the Scottish Executive. This meant that internal and external roles were not planned as complementary, integrated functions but kept quite separate. From HR's perspective, there had always been doubts about how the internal and external evaluation would fit together, and concerns about what would happen if there were a mismatch between the two. However, by commissioning the external evaluations first and independently from the projects, the internal evaluations were by default left doing everything else. HaHP felt that a clear understanding of what was required from the internal evaluation, and what resources were needed, should have been established at the outset. The lack of integration between internal and external evaluation roles also meant that there is no mechanism for bringing together and synthesising information from the internal and external evaluations. The HaHP team believed that it must be integrated much better than it has been in the past. Starting Well's view was that it was important for all the learning to be synthesised rather than focusing disproportionately on the results from the external evaluation.

Evaluation capacity and culture within implementing organisations

The whole process of uncertainty and lack of clarity and leadership for the internal evaluation function was compounded by the lack of capacity of some groups of staff to be 'critical practitioners' and to understand and build an internal evaluation role into the planning, development and review cycle. NHS secondary care was one exception, in that they had a strong culture of reflective practice and were developing information systems to support this. Variations in the evaluation culture across sectors were also noted by HR - the voluntary sector were seen to be very familiar with critical reflection and producing reports for funders; local authorities were seen to have an 'inspection culture' and produce committee reports when necessary; while the NHS were seen to have a more 'academic' approach.

Page updated: Thursday, April 07, 2005