The search results yielded 1163 titles and abstracts through Medline, PsycINFO and the Web of Science. There were 145 full-text studies checked for eligibility and a total of 20 RCTs met inclusion criteria (see Fig. 1) .
Meta-analysis of 20 effect sizes from 20 unique studies, with a total sample of 3222 participants, indicated that, on average, health coaching interventions for T2DM have a small but statistically significant (positive) effect on reducing HbA1c (g+ = 0.29, 95% CI: 0.18 to 0.40). Visual inspection of the funnel plot suggested that there was no asymmetry in the distribution of the studies and no risk of publication bias. Egger’s regression was also non-significant (p = 0.730), indicating lack of publication bias.
The effect sizes (d) of interventions ranged from d = − 0.05 to d = 0.78. None of the interventions had a large effect size , and only three had a medium effect size (d = 0.71 to d = 0.78) [42, 45, 51, 53]. The remaining 17 interventions had small (d ≥ 0.20) [36, 38,39,40, 43, 46,47,48,49, 52] or trivial (d < 0.20) effect sizes [34, 35, 37, 41, 50]. Cochrane’s Q was statistically significant (Q = 36.68, p = .009) suggesting that the effect sizes were heterogeneous and the I2 statistic indicated that a proportion of the variance in the effect sizes was explained by this heterogeneity (I2 = 48.20%), which indicates a need for moderation analysis to identify variables that account for the variability.
Table 1 reports the characteristics of included studies for both interventions (health coaching), and control groups (usual care), including sample size, mean age of participants, intervention duration, personnel, and mode of delivery (e.g., face-to-face, telephone-based, web-based). The included studies comprised 20 RCTs published between 1950 and 2022. A total of 3222 participants were included in the 20 studies, of whom 1674 were randomised to receive coaching interventions and 1548 were allocated to control groups. The majority of studies (n = 10) were conducted in the US [34,35,36,37,38,39,40,41,42,43], two were conducted in Taiwan [44, 45], and the rest were conducted once in different countries including Turkey , Canada , South Korea , Norway , Finland , Germany , Belgium , and Australia . In the 17 studies that reported gender of participants, 53% of participants were female. The mean age of the recruited participants was 59.3 (SD = 6.2). Due to the inconsistent reporting of other demographic and socioeconomic characteristics, such as education, ethnicity and income status, across the 20 papers we were unable to report them here. The recruitment of participants was varied and drawn from different communities including ethnic community centres , community health centres [34, 48, 49], community advertisement [43, 47, 49, 51], primary care or hospital clinics [38, 41, 45, 46, 53] and databases [40, 44, 50, 52]. For clinical factors, including HbA1c, there were no discernible changes between the intervention and control groups at baseline. The mean HbA1c level across all studies at baseline was 8.42% (SD = 0.78). The reduction in HbA1c found to be clinical significant in eight studies [36, 40, 42,43,44, 46, 47, 51] (decrease of ≥5 mmol/mol ).
Moderation analysis of the sample characteristics indicated that intervention effectiveness was not related to age (β = 0.19, p = 0.442) or gender (β = − 0.13, p = 0.603). Moderation analysis of the study characteristics indicated that only the type of primary outcome measure was significantly related to intervention effectiveness (Q = 4.20, p = 0.040), such that studies including HbA1c as the primary outcome (g+ = 0.32, k = 16) were more effective than studies with other primary outcomes (g+ = 0.10, k = 4).
Mode of delivery and intervention duration
Health coaching was delivered through various methods including exclusive telephone-based [34, 39, 43, 47, 52, 53], exclusive web or mobile-based remote patient monitoring/electronic assistance (ERPM/EA) systems  or in combinations of face-to-face and telephone-based [36, 38, 40, 42, 44,45,46]; face-to-face and ERPM/E A telephone-based and ERPM/EA [49,50,51] or face-to-face, telephone-based and ERPM/EA [35, 41]. The duration of studies ranged from two   to 18 months  (Mdn = 6 months). Only six studies reported separate figures for intervention and follow-up durations, with intervention duration ranging from three  to 10 months  (Mdn = 6 months) and the duration of follow-ups ranging from six  to 12 months  (Mdn = 7 months). Mode of delivery (Q = 1.17, p = 0.556) and the duration of study (β = 0.14, p = 0.535), intervention (β = − 0.04, p = 0.916) and follow-up (β = − 0.25, p = 0.574) were not significantly related to intervention effectiveness (see Table 3).
Different people delivered the health coaching interventions. In four studies, the health coaching intervention was delivered by untrained personnel [34, 41, 44, 46, 53], while the remaining 16 interventions reported training of the interventionist on health coaching. Seven studies relied on nurses to deliver coaching sessions [34, 36, 47,48,49, 52], four studies provided interventions by trained health coaches [35, 37, 50, 51], and only one study was delivered by health coaches certified by the International Coach Federation (ICF) . The remaining interventions were delivered by different professionals, including dental care providers , community health workers , dieticians , medical staff [38, 42], pharmacists , psychologists , college students , peer patients , and physicians . Type of intervention provider was not significantly related to intervention effectiveness (Q = 1.24, p = 0.538) (see Table 3).
Behavioural framework and theory use
The heterogeneity of interventions was evident in relation to the employed approaches and underpinning theories. Out of the 20 papers, five studies did not report the use of theories [34, 37, 44, 48, 51, 53]. The remaining 15 were grounded in different theories or frameworks. Most studies employed motivational interviewing [35, 36, 40, 42, 45,46,47, 49, 52], two studies used the transtheoretical model [38, 49], and self-efficacy theory, cognitive-behavioural therapy and social-cognitive theory were each used once [39, 46]. The use of theory was not significantly related to intervention effectiveness (Q = 1.34, p = 0.247), nor was the specific use of MI (Q = 0.23, p = 0.632) (see Table 3).
A total of 23 BCTs were identified across the 20 studies reviewed (see Table 5). Interventions were varied in terms of the number of BCTs that were utilized in each intervention, ranging from 0 to 9 BCTs. The median of BCTs used across all interventions was 5. The most frequently coded BCT was 1.1 goal setting (behaviour), which has identified in 13 interventions [34,35,36, 38,39,40,41, 45, 46, 49,50,51]. 1.2 problem solving was the second most commonly identified BCT, reported in 10 interventions [35,36,37,38,39, 41, 43, 49, 52, 53]. Two BCTs, 1.4 action plan [34, 35, 39, 40, 45, 46, 50, 53] and 3.1 social support (unspecified) [35, 37,38,39, 44, 45, 47, 48], were each reported in eight studies. 1.7 review outcome goals, 1.8 behavioural contract, 2.2 feedback on behaviour, 4.1 instruction on how to perform a behaviour, 8.7 graded tasks, 12.5 adding objects to the environment, and 2.5 monitoring outcome(s) of behaviour by others without feedback were each used once in six interventions [37, 39, 46, 48, 52, 53]. No BCTs were identified in one study .
BCTs and intervention effectiveness
An overview of the use of different BCTs and effect sizes found in each study is presented in Table 5. The most effective intervention based on the effect size (d = 0.78) used only one BCT: 3.1 social support (unspecified) . Only one BCT, 1.1 goal setting (behaviour,) was used across all the interventions with a medium effect size, although it was also the most commonly used BCT across interventions with small or trivial effects.
There was no evidence of an association between the number of BCTs used in an intervention and its effect size (β = − 0.11, p = 0.651) (see Table 2). Of the moderation analysis with 23 different BCTs identified, only two analysis yielded significant results. Specifically, interventions that used credible sources of information (BCT 9.1) (Hedges’ g+ = 0.08, k = 5) were significantly less effective than interventions that did not use this BCT (Hedges’ g+ = 0.34, k = 15; Q = 7.67, p = 0.006). In addition, interventions that used social reward (BCT 10.4) (Hedges’ g+ = 0.01, k = 3) were significantly less effective than interventions that did not use this BCT (Hedges’ g+ = 0.32, k = 17, Q = 3.92; p = 0.048).
Quality of the included studies
Although some studies showed good methodological quality due to their low bias [44, 45, 50,51,52], the majority were weak because of either high or unclear risk of bias [34, 35, 37,38,39,40,41,42,43, 46,47,48,49, 53]. Eleven of the 20 studies [34, 39, 42, 44, 45, 47, 49,50,51,52,53] described the method of randomization generation and 10 studies [34, 40, 42, 44, 45, 47, 50,51,52,53] used a concealed allocation schedule. The methodological quality of blinding participants and personnel on the assignment of participants to study groups were generally low due to either high or unclear bias in procedures across most studies and insufficient detail. Across all the included studies, attrition bias and selective outcome reporting bias were low and not detected. Table 4 and Fig. 2 provide further details about the quality of the included studies.