Included reports and quality
The original searches retrieved 62,742 unique references and 56 eligible reports. The updated search retrieved 9,709 unique references and nine eligible reports (Fig. 1). In total, 65 reports on 27 studies of 22 interventions were included. Sixteen of these were process evaluations, covering 13 studies and 10 interventions [22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37]. Of these ten interventions, one was delivered to children approximating to English primary school age [5,6,7,8,9,10, 38], six to children of English secondary school age (age 11–18) and three to children whose ages spanned these ranges. Of the thirteen studies, five were from the USA, four from the UK three from Australia and one from Uganda. Nine reports drew on quantitative and qualitative data, five only on quantitative data and two only on qualitative data. Table 1 summarises the characteristics of process evaluations.
Initial agreement over study quality was high (> 90%). Three studies were judged to be of both high reliability and high utility in addressing our research questions. [25, 26, 30, 35,36,37] One study was judged to be of high reliability and of medium utility in addressing our research questions.  One study was judged as medium reliability but low utility.  A further study was judged as being of low reliability but of medium utility.  Four studies were judged as of low quality and low utility [27, 28, 33, 34]. Three studies were rated as of high [31, 32] or medium reliability,  but low utility (Table 2).
Synthesis of evidence on factors affecting implementation
Various themes and sub-themes were apparent in quotes from study participants and author interpretations, the structure of which is summarised in Table 3. These are presented below structured according to the constructs from the General Theory of Implementation (indicating by these being in inverted commas) with which they aligned. References indicate which studies informed which themes.
‘Sense-making’ was a recurrent theme in studies i.e. a process of staff coming to understand the intervention which contributed to the enactment of interventions. Sense-making was reported to accrue over time and pervade all, not just the initial, stages of implementation [22, 24, 26, 30, 35]. Various factors were reported to affect how school staff and students understood intervention resources.
Intervention capability’ to be made sense of
A sub-theme suggested that sense-making could be facilitated by an intervention’s ‘capability’ (i.e. workability). This could be in terms of providing good-quality materials and/or ongoing support in the form of training, external facilitation or coaching [22, 24,25,26, 30, 33,34,35,36]. Materials and resources that included tangible, contextually relevant examples were reported as enabling providers to understand how intervention activities might occur in their setting, for example as reported by one head-teacher in an evaluation of Learning Together rated as of high reliability and usefulness :
“The one thing schools need is a model, of how it’s going to work in the school, in a real-life school, so that they can almost touch it, taste it, feel it, and then start implementing it in their own schools”. (p.39)
Two studies reported that staff were sometimes initially confused by intervention materials or external providers [25, 26]. In the study of the Healthy School Ethos intervention rated as of highly reliability and usefulness , an initial presentation by an external facilitator was reported to have caused staff and students to misunderstand the aims of a whole-school intervention.
School ‘capacity’ to make sense of an intervention
Another sub-theme apparent in one UK study was how staff’s making sense of an intervention could be influenced by their existing priorities and the school’s institutional ‘capacity’ in terms of the resources present to support implementation [22, 26, 35, 36]. Those leading implementation in one school were said to have creatively reinterpreted Learning Together, an anti-bullying intervention, as an intervention aiming to maintain the emotional health of pressurised students in an academically selective school . This evaluation further reported that, in another school, the lead reinterpreted the staff-student action group as being a site for students to learn the skills needed to avoid or respond to bullying (rather than, as intended, to coordinate intervention activities). This occurred in the context of the lead’s imprecise grasp of the intervention and inability to involve other staff.
The notion of ‘cognitive participation’ also recurred as a theme across studies, presented as a process of staff commiting to implement an intervention. Various factors concerning the intervention and the school were identified as influencing the extent to which school agents felt able to commit to enact intervention activities. Like sense-making, cognitive participation was a process that was built across all stages of implementation . Several factors affected how cognitive participation developed.
Intervention ‘capability’ for local tailoring and adding value
A key sub-theme apparent in several studies was that school staff assessed intervention ‘capability (workability) in terms of ease of integration with existing practices [22, 24,25,26,27, 36]. Interventions that could be locally tailored or build on existing work were more likely to secure staff’s cognitive participation. An evaluation of low reliability and usefulness of the Drug Abuse Resistance Education (DARE) Plus intervention  described how the assessment phase of the intervention was essential to tailor the intervention and develop commitment:
”The assessment phase of the organizing process is critical to its long-term success. It is invaluable to take the required time to get to know the community before attempting to launch an action team.” (p.17)
Another report describes how school staff bought-in to use of restorative practice as an approach to discipline because this was viewed as providing a means of building on existing work and developing a consistent approach to discipline .
Interventions not viewed as being capable of local tailoring often failed to engender staff commitment, as reported by evaluations of the Responsive Classrooms and Positive Action interventions, respectively of high reliability and medium usefulness and low reliability and usefulness [22, 34].
A sub-theme apparent was that this lack of intervention capability for tailoring or adding value was particularly undermining for whole-school elements [22, 34]. Head-teachers and other school leaders could withhold commitment when they felt that whole-school actions might jeopardise their wider strategies. This could be the case, for example, where interventions required changes to school rewards or discipline policies that school leaders thought might weaken the school’s ability to pass school inspections or attract parents to send their children to the school. As an evaluation of Positive Action  reported:
”Reluctance to change whole-school policy may be exacerbated by circumstances such as an upcoming [government] inspection: ’It was hard to make a whole-school change to sanction and reward policy, so whole-school activity was harder to implement. [The government inspectorate] was coming and it would have been too big a change.’” (p.34)
Intervention ‘capability’ for using data to build commitment
Another sub-theme was that the provision of local data as part of the intervention could improve its ‘capability’ (workability) and build staff commitment [24,25,26,27, 30, 35, 36]. The evaluation of the Learning Together intervention suggested that providing such data could make it harder for staff to dismiss the need for intervention [25, 35, 36]. A staff-member on a pastoral team commented :
“I remember when [facilitator] came to present to [senior leadership team] and said how terrible our data was… it was like a tumbleweed moment; it was so funny. I mean… it wasn’t funny in a good way, but… but it was a realistic… realisation for everyone if you know what I mean… Because we all knew it was like that, but we didn’t realise how much the children didn’t actually like us.” (p. 990)
However, in an example of refutational synthesis, several studies identified that the provision of data could sometimes undermine staff commitment when staff interpreted the data as a criticism of their work to date or where data did not indicate positive trends after implementing an intervention [26, 30, 36, 37].
Intervention ‘capability’ in terms of student participation
A sub-theme from several UK evaluations of interventions aiming to encourage student participation in decisions was that students were more likely to commit to an intervention where this offered an opportunity for them to express their views [25, 26, 34,35,36,37].
Staff ‘potential’ for commitment based on perceived need
Staff commitment to interventions was influenced by staff’s ‘potential’: whether staff were attitudinally ready for such an intervention. A key sub-theme was that interventions should offer school leaders something they already knew they needed [22,23,24, 34, 36]. This might be a way of responding to government policies, pressures from parents or inspection requirements. Or it might address internal imperatives, such as school leaders’ existing strategies for school change. This theme was particularly clear in the UK studies of both Healthy School Ethos  and Learning Together [25, 30, 35, 36]. The pilot evaluation of Learning Together , for example, reported:
“head teachers and their [management teams] consistently reported that it was important to address aggressive behaviours in order to recruit and retain ‘the best’ parents and students. [Managers] also suggested that this project was prioritised as it was seen as likely to impress the national school inspectorate… due to its focus on student voice and behaviour.” (p.328)
Interventions aiming to achieve whole-school change were more likely to get school leaders’ commitment when there was already a recognised need for change, for example because of poor inspection results [25, 30]. Reciprocally translating with this concept, it was apparent that in schools where leaders perceived no such urgent imperative for change, genuine school commitment was less likely.
Staff ‘potential’ for commitment based on existing strategies and values
A related sub-theme was that school staff had more attitudinal ‘potential’ for commitment to a whole-school intervention when their existing strategies and values made this seem attractive [22, 23, 25, 30, 32, 36]. New head-teachers were reported as particularly likely to commit to interventions involving whole-school change because these aligned with their desire to make their mark and change schools . Reciprocally translating with this concept of school leaders’ ‘potential’ was teachers’ ‘potential’ [23, 25]. For example, teachers with a prior commitment to social and character education within their classes were more likely to implement curricula addressing this according to a study of Positive Action of medium reliability and low usefulness .
In cases where the values or priorities did not align, staff commitment appeared less likely [22, 23, 32]. For example, where staff or students perceived restorative practice to be a softer option, they were reportedly unlikely to commit to enacting it. As a study of Responsive Classrooms  reported:
“In contrast, some middle school staff members’ beliefs about the value of punitive responses to problem behavior were incompatible with the core tenets of the intervention, which emphasized inclusion and opportunities to learn: ’When you steal, there are real consequences; there’s jail or fines…’ These staff members believed that zero-tolerance policies, which use punishment as an extrinsic motivator for behavior change, were more effective than RC approaches.” (p.84)
A sub-theme concerned the possibility of schools committing to implementing only those intervention components aligning with their existing strategies and values, rejecting components that they regarded as deviating from these [26, 36].
Evaluations also examined processes by which those in schools engaged in ‘collective action’ (working together) to divide up responsibilities for delivering interventions. A number of factors were identified as influences on such processes.
Intervention ‘capability’ as workable
A key sub-theme was the importance of interventions being locally workable for staff enacting interventions as planned [22, 26, 36]. For example, curriculum materials which did not fit into the school curriculum or which did not provide staff with clear lesson plans tended to be adapted before they were delivered, or were not delivered at all [24, 35, 36].
An important aspect of workability was the extent to which guidance materials spelt out how delivery should proceed. For example, materials underpinning a restorative practice interventions needed to specify which staff-members were responsible and whether the intervention was intended to complement or replace punitive discipline .
Some interventions were not collectively enacted as had been planned. For example, an evaluation of Responsive Classrooms  found that a new approach to discipline failed to work within the reality of schools:
”[P]articipants reported that a key RC strategy, Logical Consequences, in which a response to student misbehavior is tied to the specific incident and creates an opportunity for learning, was too unwieldy to implement in a way that students could anticipate and incorporate: ’I totally agree with the theory behind logical consequences where you want the consequences that match the behavior and that’s, like, respectful to the child and respectful to the teacher. But it’s hard because it’s different every time… It’s not a system where they know, like, oh, if I do this I know what’s going to happen.’” (p.85)
Planning groups as a key element of intervention ‘capability’
An important sub-theme was that interventions which included planning groups, consisting of staff and sometimes students, parents or other community-members, were more workable in ensuring collective action. This was apparent from reports of the Gatehouse Project (of low reliability and medium usefulness), Learning Together interventions (of high reliability and usefulness) [24,25,26, 35,36,37] and other interventions [23, 24, 27, 32,33,34]. Diverse participation in such groups could support implementation by ensuring that the decisions made by the group were pragmatic and by achieving wider commitment across the school.
Such groups were reported to be particularly facilitative of whole-school approaches [24, 26, 35, 37]. These groups could also help ensure that intervention activities added up to a coordinated process of integrated school transformation, rather than merely a disparate set of initiatives.
Synergy between intervention components as a key element of intervention ‘capability’
A further sub-theme was that some interventions were more workable because they had better synergies between intervention components than others [22, 24,25,26, 30, 35,36,37]. Some intervention activities created the informational and relational resources needed to enable agents to enact other actions. The evaluation of the Gatehouse Project  reported for example:
“It is clear from our work that these elements – the adolescent health team, the school social climate profile, and the critical friend – do not work in isolation. The profile provides local data that are essential for identifying risk and protective factors relevant to the particular school community. The adolescent health team ensures that the responses to the profile are owned and implemented by the whole-school community. The critical friend provides expertise, impetus, motivation, and links to external resources.” (p. 380)
As described above, data on student needs being provided as part of an intervention could encourage others to implement intervention activities or lead to school staff producing or sharing other data [25, 30].
One area of synergy was where training components provided staff with the skills they needed to deliver other intervention elements. This could be valuable in ensuring staff accumulated and consolidated their skills .
Other reports focused on lack of intervention-component synergy as an inhibitor of collective action. For example, some evaluations reported that there was a noticeable lack of effective interaction between curriculum and whole-school components. In some cases, classroom curriculum activities were enacted but whole-school changes were incompletely delivered . In other case, whole-school elements which aimed to build on existing school achievements were enacted but curriculum elements were not delivered with fidelity because these were judged unworkable [35, 36].
School ‘capacity’ to support collective action
The extent to which agents in schools could come together to collectively enact interventions also depended on school ‘capacity’ (i.e. the resources available to these agents). The lack of space in school timetables, and the lack of non-contact time within which school staff could plan intervention activities was frequently reported by evaluations [22, 25, 27, 28, 33, 34, 36]. For example, the evaluation of low reliability and usefulness of the Cyber Friendly Schools intervention reported :
“Many teachers reported not being able to find sufficient time in their teaching curriculum to complete the eight learning activities.” (p. 104)
In the evaluations of low reliability and usefulness of the DARE Plus intervention and the PPP intervention [27, 33], whole-school elements were described as the most challenging and time-consuming to organise.
Staff struggled to marshal time and other resources when they were expected to deliver a new intervention alongside other initiatives. These situations diffused the resources available for any one intervention and eroded agents’ ability to commit the time needed to support effective decision-making and delivery. The evaluation of Responsive Classrooms for example reported :
”A school leader noted that ‘It’s not one new thing; it’s always five new things that we’re working on. I think the attention span is tested.’” (p.84)
Another resource factor in determining whether interventions were collectively enacted with fidelity was whether those charged with leading the intervention possessed leadership resources, such as a budget, the ability to direct other staff or the ability to modify policies or systems [22, 25,26,27, 30, 36, 37, 39]. Schools that gave intervention leadership roles to powerful staff consistently achieved better implementation according to several evaluations. Power and authority could be formal or informal, the latter reflecting individuals or groups having a long track-record at the school, strong relationships and an informal ability to persuade people to make things happen [35, 36]. An evaluation of Learning Together  for example reported:
”In another school, despite there being no senior leaders on the group, the lead had worked for a long time at the school and was well respected and liked by both students and staff. Thus, it was possible to galvanise action without the formal involvement of senior leaders in some cases.” (p.989)
Where leadership commitment to intervention activities was limited or inconsistent, there may thus have been less collective vision and impetus for implementation, as reported in the evaluation of the Responsive Classrooms intervention . Lack of senior level support could also affect the drawing down of material and cognitive resources to support intervention activities [35, 36]. For example, some decisions made by action groups were stalled or rejected by other agents within the school system, such as head-teachers or school-leadership teams [35, 36].
Interventions could also be better implemented in schools characterised by strong connections between staff or with strong cultures of innovation [22,23,24, 32, 36]. In schools with strong connections, those agents leading interventions could draw on existing relational resources such as mutual support, observation and learning to support enactment, rather than attempting to develop this from a low baseline. An evaluation of Positive Action  of high reliability and low usefulness reported:
“Stronger affiliation among teachers likely led to more opportunities to share ideas about PA materials and observe other teachers as they carried out PA activities outside of the classroom. This may have influenced teachers’ use of these supplementary program components, with higher levels of use by teachers who had perceptions of high engagement and support among teachers in their schools.” (p.1091)
An evaluation of the Gatehouse Project  similarly reported the importance of networks connecting staff in enabling collective action.
A culture of teacher autonomy, as reported in the evaluation of Friendly Schools , could undermine collective action, because it was difficult for those leading an intervention to encourage the consistent enactment of new practices which deviated from locally understood norms and expectations of staff roles. Similarly, the evaluation of the Responsive Classrooms  intervention reported:
“School staff observed that [Responsive Classrooms], a schoolwide intervention, ran counter to the school’s culture of individuality. For example, one teacher noted: ’One… characteristic of [the school is]… there’s a lot of autonomy in terms of how teachers run their classrooms… it’s a little bit of territorial, like… I know what I’m doing and I have my way of doing it so I don’t need to participate necessarily in a whole-school anything.’” (p.84)
A staff culture of innovation could also support collective implementation. Such cultures could encourage staff to take the time to identify who would implement the intervention and then enact this with fidelity [23, 34].
Whole-school interventions took time to build. ‘Reflexive monitoring’ (whereby staff assessed the success of implementation through formal or informal processes) was important in determining the extent to which implementation built or dissipated over time.
Intervention ‘capability’ for reflexive monitoring
Reflexive monitoring worked well when interventions included this as an explicit component [24, 26, 30, 36] increasing their ‘capability (workability). Studies indicated that interventions were particularly successful when they included an action group that reviewed data, identified priorities, oversaw delivery and reflected on the results. This enabled members to reflexively monitor what was being enacted and with what consequences. Evaluations suggested that this gave participants the permission and resources to try different things, persist with what was perceived as working and refine or reject what was perceived to go less well. This approach allowed staff to abandon activities viewed as unsuccessful without rejecting the intervention overall. For example, an evaluation of the Gatehouse Project  reported:
”This common purpose gave permission for teachers to try new strategies such as substantially restructuring student and teacher teams. For example, in one school, teachers worked together to reorganize classes into small groups of four or five learners and teachers into teaching teams to promote a collaborative and an academic environment.” (p.375)
As part of processes of reflexive monitoring, ‘quick wins’ evidencing positive outcomes could also can help maintain and further build coalitions and commitment, and collective impetus to implement further intervention activities .
As well as groups, ongoing support from training, facilitation or coaching could also support reflexive monitoring by providing an opportunity for reflection and/or an outsider perspective. The importance of an external facilitator was, for example, described as follows in an evaluation of the Gatehouse Project :
“The support that [critical friend] provided in the staff room, in staff meetings, has been invaluable. We wouldn’t be where we are now, because I’d never recognized the value of having a person who is not a practicing teacher in the school at the moment… the way that you’ve been able to involve yourself in the discussion and the activities that are going on and come through with some very well-made points at crucial times, but in small groups and large groups.” (p. 377)
‘Collective reflexive monitoring’ to refine implementation
Reflexive monitoring could be a collective action oriented towards refining how an intervention was implemented [24, 33, 35]. For example, in the case of two interventions, over time staff in some schools opted to recruit fewer disengaged or disadvantaged students to participate in intervention activities [33, 36].
When external facilitators were removed in the Learning Together intervention, this resulted in the overall fidelity of implementation declining but some intervention components becoming mainstreamed so that their ‘form’ was modified at the same time as their ‘function’ became integrated within school policies and systems, as one evaluation  evaluation reported:
”Most interviewees suggested that external facilitation was not necessary in the final year, but a few suggested this was a significant loss: ’The absence of [facilitator] has been incredibly significant because she… was able to tie it in all the time to the agenda. And was a touchstone I suppose really for that. And then… so that… I think that was a loss’. (Senior leadership team member…)” (p.991)
Reflexive monitoring reinforcing implementation
Reflexive monitoring could reinforce the conditions necessary for further implementation [24, 26, 33, 36]. Staff and students recognised through processes of reflexive monitoring that interventions had diverse consequences for different parts of school systems, many of which were unanticipated. For example, an evaluation of the Gatehouse Project  reported:
“not only has the work of the adolescent health team facilitated reviews of organizational structure, but it has also contributed to a substantial shift in the perceptions of what is the core business of schools. [As one staff member reported:] ’But just really reinforcing the ideas of the positiveness and feeling secure at school, and certainly encouraging staff, that irrespective of what subject they teach, they can have an influence. And it’s a bit like planting a seed…’. There was also evidence of changing professional identity – teachers shifted their position from being a teacher of a subject or program to placing the young person and learning at the center of practice.” (p.379)
Similarly, involving students in decision-making or being surveyed about their needs could transform staff and student attitudes by suggesting that the school was becoming a more participative institution .