Item 1. Did the research questions and inclusion criteria for the review include the components of PICO?  (ratings: YES, NO)
There were no challenges with this item.
Item 2. Did the report of the review contain an explicit statement that the review methods were established prior to the conduct of the review and did the report justify any significant deviations from the protocol?  (ratings: YES, PARTIAL YES, NO)
Prospectively planned and written SR protocols could reduce the risk of bias in SRs . There are three challenges with this item. First, the item requires that a SR contains “an explicit statement that the review methods were established prior to the conduct of the review” and that SR authors “justify any significant deviations from the protocol” . However, an explicit statement without access to the protocol is insufficient to rate the protocol’s contents and any deviations. Second, it is unclear how to rate this item if SR protocol exists but deviations are either not explained or are extensive. It is also unclear how to decide if any deviations are extensive or not. Third, existence of a protocol does not guarantee that the protocol was completed before SR commenced or, at least, before study selection and possibly data coding were completed. Thus, it is unclear how to rate the item if SR protocol was registered shortly before submission of the completed SR for peer-review unless reasons are given by SR authors.
Item 3. Did the review authors explain their selection of the study designs for inclusion in the review?  (ratings: YES, NO)
The inclusion of different study designs should be justified in SRs . This is especially important because AMSTAR 2 can be used to appraise SRs that include different study designs, such as RCTs, NRSI, or both . There are two challenges with this item. First, the inclusion of RCTs is often not explicitly explained in SRs. Although RCTs are considered the gold standard, they may not be appropriate to evaluate the outcomes in some clinical fields (for example, see ). Several methodological studies with AMSTAR 2 showed that this item was rated NO in over 90% of the appraised SRs [10, 13,14,15]. However, some SRs explained that the highest quality of evidence in their field was available from RCTs and thus implicitly justified the inclusion of RCTs. Second, item 3 wording in the AMSTAR 2 tool is open to different interpretations and appears more conservative than in the AMSTAR 2 guidance document. Specifically, the item can be rated if SR authors provide a rationale for including specific study designs in full-text of their SR according to the AMSTAR 2 guidance document  while an explanation for including specific study designs is required by the AMSTAR 2 tool .
Item 4. Did the review authors use a comprehensive literature search strategy?  (ratings: YES, PARTIAL YES, NO)
Requirements for a comprehensive literature search strategy are listed in the AMSTAR 2 tool and the AMSTAR 2 guidance document . There are several challenges with this item. First, to assess the search strategy in a SR, an access to the complete search strategy is required. However, according to the item it is sufficient if only the keywords are reported. Reporting of keywords only is associated with poorer reproducibility of search results than disclosing the entire search strategy. Important aspects of the search strategy also show how the search terms were combined with Boolean operators, search timeframe, last search date and search location (e.g., in titles only). Second, some Cochrane groups have their own trial databases established from searches in several other databases . If available, authors performing Cochrane SR will usually search such a single database. Thus, it is unclear how to rate this item if a single database is searched even if such a database includes trials from several other databases. Third, the search for gray literature and the search in trial registries are listed as separate requirements for the YES rating in the AMSTAR 2 tool  while the search in trial registries is mentioned as an example of the gray literature search in the AMSTAR 2 guidance document . It is also unclear whether trial or study registries need to be searched in all cases. While RCTs may be registered, registration of NRSI is far from standard. Fourth, it is unclear how SR completion is defined to assess if the search strategy was recent for the YES rating in the AMSTAR 2 tool . The peer review process of a SR can take several months meaning that the SR can be published long after the actual completion of the SR. If there are more than 24 months between the last search and publication, the SR may only obtain a PARTIAL YES rating, although it was completed within 24 months. Fifth, including or consulting content experts needs to be done by default according to the AMSTAR 2 guidance document. However, it is unclear how experts are defined, how many should be consulted or whether they belong to the SR team or are external people not involved in SR production. An example of an expert could be a librarian or information specialist, who may also be a co-author on the SR.
Item 5. Did the review authors perform study selection in duplicate?  (ratings: YES, NO)
There are two challenges with this item. First, the YES rating requires that studies were selected in duplicate and that high consensus was achieved between SR authors. It is unclear how to rate this item if consensus between SR authors is not mentioned. Second, additional explanations are required to rate this item. For example, according to the AMSTAR 2 tool  it is not clear whether the complete study selection process should be done in duplicate (i.e., screening of titles and abstracts as well as screening of full-texts), while the AMSTAR 2 guidance document  suggests that this may be required. Furthermore, the procedure for computing agreement or dealing with poor agreement between SR authors is not specified. For example, it is unclear if agreement should be computed for the complete study selection process and what sample size of studies should be chosen to compute agreement.
Item 6. Did the review authors perform data extraction in duplicate?  (ratings: YES, NO)
Similar challenges as for item 5 also apply to this item. For example, it is not clear how to deal with small samples of extracted studies. In addition, this item refers only to data extraction and not to the risk of bias assessment of the primary studies. Thus, it is unclear if such assessment should also be performed in duplicate. This issue is neither covered by this item nor by item 9 (risk of bias in individual studies) on AMSTAR 2. Interestingly, the current version of the Cochrane Handbook suggests that it should be mandatory to perform the risk of bias assessment in duplicate (Version 6.3, Chap. 7, Sect. 7.3.2) .
Item 7. Did the review authors provide a list of excluded studies and justify the exclusions?  (ratings: YES, PARTIAL YES, NO)
This item is particularly important for replicability of SRs and detecting any biases in study selection. There are two challenges with this item. First, it is unclear if the list of excluded studies should show all studies selected for full-text screening from title and abstract screening. The requirement for the PARTIAL YES rating indeed appears to refer to such studies from title and abstract screening as “all potentially relevant studies”. Second, it is unclear if reasons for exclusion should be reported only for studies screened in full-text or for all studies. Once again, it appears that the requirement for the YES rating refers only to studies screened in full-text as “each potentially relevant study”.
Item 8. Did the review authors describe the included studies in adequate detail?  (ratings: YES, PARTIAL YES, NO)
This item requires that a SR describes PICO and study designs for the PARTIAL YES rating as well as setting and timeframe for follow-up for the YES rating. The majority of SRs indeed meet the minimum requirements for this item, resulting in a high proportion of PARTIAL YES ratings [13, 14]. However, the challenge with this item is that there are no thresholds for deciding if study characteristics are reported in adequate detail. Deciding whether information on study characteristics is adequately “detailed” is often judged differently among AMSTAR 2 users and requires a high degree of judgment and subjective decision making. According to the AMSTAR 2 guidance document  the details “should be sufficient for an appraiser, or user, to make judgments about the extent to which the studies were appropriately chosen (in relation to the PICO structure)”. Thus, the sufficient details may depend on SR aims as well as the individual expectations of AMSTAR 2 users.
Item 9. Did the review authors use a satisfactory technique for assessing the risk of bias (RoB) in individual studies that were included in the review?  (separate ratings for RCTs and NRSI: YES, PARTIAL YES, NO, INCLUDES ONLY NRSI, INCLUDES ONLY RCTs)
This item distinguishes between RCTs and NRSIs and is accompanied by extensive notes for rating of NRSIs in the AMSTAR 2 guidance document . There are two challenges with this item. First, it is unclear how to rate this item if the RoB assessment was adequately assessed in only one study type, such as RCTs, but not in NRSI or vice versa. Second, it is not explicitly required that the RoB should be assessed in duplicate. Such procedure is already recommended in the Cochrane handbook (Version 6.3, Chap. 7, Sect. 7.3.2)  and could reduce any difficulties in rating of this item.
Item 10. Did the review authors report on the sources of funding for the studies included in the review?  (ratings: YES, NO)
Funding for primary studies and potential sources of conflict of interest in the SRs are addressed by two AMSTAR 2 items (items 10 and 16, respectively ). In general, the sources of funding in primary studies are often not reported in SRs and thus this item is not fulfilled by the vast majority of SRs [10, 14]. According to the AMSTAR 2 guidance document , the information on funding in item 10 is needed to assess any conflicts of interest related to such funding (e.g., bias towards results that favour products sponsored by study funders). The challenge with item 10 is that it addresses only the sources of funding in primary studies unlike item 16 that addresses any sources of conflict of interest (including funding for conducting the SR). Thus, the discrepancy between the content of both items creates a misconception that funding alone could affect primary studies while SRs could also have other sources of potential conflicts of interest. As stated in item 16 and in the AMSTAR 2 guidance document , there could be several sources of potential conflict of interest, such as professional conflicts.
Item 11. If meta-analysis was performed did the review authors use appropriate methods for statistical combination of results?  (separate ratings for RCTs and NRSI: YES, NO, NO META-ANALYSIS CONDUCTED)
The AMSTAR 2 guidance document  lists some requirements for rating of the appropriateness of a meta-analysis. There are three challenges with this item. First, in contrast to the AMSTAR 2 guidance document , item 11 does not mention that the justification for performing meta-analysis should be planned a priori and included in the SR protocol. It is also not clear what information should be provided in such a justification, although the AMSTAR 2 guidance document  suggests that studies should be “compatible (in terms of populations, controls and interventions)”. Second, in addition to justification for meta-analysis, the YES rating has several other requirements that are in general not agreed upon in the field of meta-analysis, such as the type of weighting technique and computation or adjustment for heterogeneity. It is unclear how to rate this item if at least one of the requirements is not fulfilled. There is also no guidance on what specific meta-analytic methods can be rated as appropriate. In general, this item can be rated only if SRs provide detailed description of meta-analysis that is required to replicate the analysis and is in accordance with the Cochrane handbook (Version 6.3, Chap. 10: Analyzing data and undertaking meta-analyses) . Such description should include (1) the effect-size computation, (2) weighting technique method (e.g., inverse-variance, Mantel-Haenszel), (3) statistical software package used, (4) meta-analytical model type for the main analysis (e.g., random-effects model), (5) computation of adjustment for heterogeneity, (6) exploration of heterogeneity, such as subgroup analysis, sensitivity analysis or meta-regression. Third, similar to item 9, it is unclear how to rate this item if meta-analysis was adequately performed in only one study type, such as RCTs, but not in NRSIs or vice versa.
Item 12. If meta-analysis was performed, did the review authors assess the potential impact of RoB in individual studies on the results of the meta-analysis or other evidence synthesis?  (ratings: YES, NO, NO META-ANALYSIS CONDUCTED)
There are five challenges with this item. First, it is unclear how to rate this item because the AMSTAR 2 tool  refers to SRs with meta-analysis while the AMSTAR 2 guidance document  suggests that in the absence of meta-analysis SR authors “should still provide some commentary on the likely impact of RoB on individual study results”. Second, the item is fulfilled in case of low RoB, but only if SR includes RCTs, while no rating guidance is provided if SR includes NRSI with low RoB. Third, the item requires that SR authors perform sensitivity analyses to investigate the impact of variable or high RoB on the pooled effects in meta-analysis. However, it is unclear what analysis can be considered as adequate (e.g., meta-regression or subgroup analysis). Fourth, the rating of this item is difficult if sensitivity analysis is not performed because all studies in SR have a high RoB or there are too few studies for such an analysis. Fifth, it is unclear how to rate this item if the RoB assessment was already inadequately performed in item 9 and thus produced a biased RoB assessment.
Item 13. Did the review authors account for RoB in individual studies when interpreting/discussing the results of the review?  (ratings: YES, NO)
Assuming that the RoB was adequately performed in item 9, the main challenge with rating of this item is to decide to what extent the impact of the RoB in primary studies should be discussed in a SR (e.g., one general sentence vs. a detailed paragraph on RoB of the primary and secondary outcomes). The AMSTAR 2 guidance document  notes that RoB should especially be considered in any recommendations for the clinical care or policy in a SR. It is also unclear how to rate this item if multiple outcomes were assessed in a SR, but the impact of RoB was discussed for example only for the primary outcome.
Item 14. Did the review authors provide a satisfactory explanation for, and discussion of, any heterogeneity observed in the results of the review?  (ratings: YES, NO)
There are two challenges with this item. First, it is unclear how heterogeneity is defined in this item. According to the Cochrane Handbook, clinical and methodological heterogeneity can lead to a variation in study effect estimates that can be detected as statistical heterogeneity in a meta-analysis (Version 6.3, Chap. 10, Sect. 10.10.1) . Clinical heterogeneity in terms of similar PICO factors must be considered in the justification for performing meta-analysis and in choosing the meta-analytic model (fixed-effect or random-effects). Statistical heterogeneity is often referred to simply as heterogeneity and can be measured in a meta-analysis. It appears that the term heterogeneity in this item refers to any heterogeneity (clinical, methodological or statistical). Second, it is unclear how heterogeneity should be computed and when the explanation and discussion of heterogeneity are adequate. In general, any sensitivity analyses, such as subgroup or meta-regression analyses require a meaningful number of studies (e.g., at least five). This number may not be reached if meta-analysis was performed with less than five studies (e.g., Cochrane Handbook, Version 6.3, Chap. 10, Sect. 10.10.2 ). Numerous and not preplanned subgroup analyses are questionable and affect the credibility of results of meta-analysis because of the increased risk for false-positive findings .
Item 15. If they performed quantitative synthesis did the review authors carry out an adequate investigation of publication bias (small study bias) and discuss its likely impact on the results of the review?  (ratings: YES, NO, NO META-ANALYSIS CONDUCTED)
Publication bias can affect the outcomes of a meta-analysis and thus needs to be addressed in SRs along other biases in primary study selection or methods. There are five challenges with this item. First, it is unclear what investigation of publication bias can be considered adequate and what reasons for not performing the analysis (e.g., due to a small number of studies) are acceptable for the YES rating. Second, the requirement for the YES rating consists of two criteria: (1) that a publication bias test is performed and (2) that the result of such a test is discussed. However, it is unclear to what extent the results of a publication bias test should be discussed. Interestingly, this item also does not require that the publication bias test is (appropriately) reported in SRs. In fact, SR readers are unable to verify the interpretation of any potential publication bias if the results of a publication bias test are not reported in text or on a standard figure, such as a funnel plot. Third, publication bias cannot always be captured graphically or statistically in a meaningful way when less than 10 studies are included in a meta-analysis (e.g., Cochrane Handbook, Version 6.3, Chap. 13, Sect. 188.8.131.52 ). Interestingly, the AMSTAR 2 guidance document  highlights the importance of the context and setting of a SR (e.g., industry-sponsored SRs) as well as a deep and intensive literature search that includes, for example, the search for gray literature (see also item 4). Thus, in a methodological study, this item was pragmatically rated YES if the search strategy considered gray literature . Fourth, the rating NO META-ANALYSIS CONDUCTED implies that publication bias is relevant only for SRs with meta-analysis. However, publication bias should be discussed as a potential source of bias in all SRs, albeit the extent of such discussion is not defined in this item. For example, one sentence in the limitations section may not be sufficient to address the publication bias in SR. Fifth, it is unclear if publication bias should be assessed and discussed for (1) each outcome in a meta-analysis separately and (2) for all studies in a meta-analysis irrespective of study types (RCTs and NRSI).
Item 16. Did the review authors report any potential sources of conflict of interest, including any funding they received for conducting the review?  (ratings: YES, NO)
The challenge with item 16 is that it is not clear what sources of conflict of interest should be considered in SRs. While financial interests need to be disclosed in academic journals, there are also intellectual or institutional conflicts of interest that may apply to all or individual SR authors. Furthermore, it is unclear what constitutes an acceptable management of any potential conflicts. One suggestion could be that SR authors with potential conflicts of interest are not involved in some aspects of SR production (e.g., the risk of bias assessment of own studies included in SRs or overviews of SRs) and that they explicitly state any methods implemented to reduce the risk of any author biases [19, 20].