Health Technology Assessment 2003; Vol 7: number 27

Executive Summary

Adobe Acrobat icon View/Download full monograph in Adobe Acrobat format (871 kbytes)

Adobe Acrobat icon View/Download 4-page summary in Adobe Acrobat format  (suitable for printing)

Evaluating non-randomised intervention studies

JJ Deeks1*
J Dinnes2
R D’Amico1
AJ Sowden3
C Sakarovitch1
F Song4
M Petticrew5
DG Altman1

In collaboration with the International Stroke Trial and the European Carotid Surgery Trial Collaborative Groups

1Centre for Statistics in Medicine, Institute of Health Sciences, Oxford, UK
2Southampton Health Technology Assessments Centre, University of Southampton, UK
3NHS Centre for Reviews and Dissemination, University of York, UK
4Department of Public Health and Epidemiology, University of Birmingham, UK
5MRC Social and Public Health Sciences Unit, University of Glasgow, UK

* Corresponding author

Background

In the absence of randomised controlled trials (RCTs), healthcare practitioners and policy-makers rely on non-randomised studies to provide evidence of the effectiveness of healthcare interventions. However, there is controversy over the validity of non-randomised evidence, related to the existence and magnitude of selection bias.

Objectives

To consider methods and related evidence for evaluating bias in non-randomised intervention studies.

Methods

1. Three reviews were conducted to consider:

2. New empirical investigations were conducted generating non-randomised studies from two large, multicentre RCTs by selectively resampling trial participants according to allocated treatment, centre and period. These were used to examine:

The resampling design overcame particular problems of meta-confounding and variability of direction and magnitude of bias that hinder the interpretation of previous reviews.

Results

Empirical comparisons of randomised and non-randomised evidence

Eight studies compared results of randomised and non-randomised studies across multiple interventions using meta-epidemiological techniques. The studies reached conflicting conclusions, explicable by differences in:

The only deducible conclusions were (a) results of randomised and non-randomised studies sometimes, but not always, differ and (b) both similarities and differences may often be explicable by other confounding factors.

Quality assessment tools for evaluating non-randomised studies

We identified 194 tools that could be or had been used to assess non-randomised studies. Around half were scales and half checklists, most were published within systematic reviews and most were poorly developed with scant attention paid to principles of scale development.

Sixty tools covered at least five of six pre-specified internal validity domains (creation of groups, blinding, soundness of information, follow-up, analysis of comparability, analysis of outcome), although the degree of coverage varied. Fourteen tools covered three of four core items of particular importance for non-randomised studies (How allocation occurred? Was the study designed to generate comparable groups? Were prognostic factors identified? Was case-mix adjustment used?). Six tools were thought suitable for use in systematic reviews.

Use of quality assessment in systematic reviews of non-randomised studies

Of 511 systematic reviews that included non-randomised studies, only 169 (33%) assessed study quality. Many used quality assessment tools designed for RCTs or developed by the authors themselves, and did not include key quality criteria relevant to non-randomised studies. Sixty-nine reviews investigated the impact of quality on study results in a quantitative manner.

Empirical estimates of bias associated with non-random allocation

The bias introduced by non-random allocation was noted to have two components. First, the bias could lead to consistent over- or underestimations of treatment effects. This occurred for historical controls, the direction of bias depending on time trends in the case-mix of participants recruited to the study. Second, the bias increased variation in results for both historical and concurrent controls, owing to haphazard differences in case-mix between groups. The biases were large enough to lead studies falsely to conclude significant findings of benefit or harm.

Empirical evaluation of case-mix adjustment methods

Four strategies for case-mix adjustment were evaluated: none adequately adjusted for bias in historically and concurrently controlled studies. Logistic regression on average increased bias. Propensity score methods performed better, but were not satisfactory in most situations. Detailed investigation revealed that adequate adjustment can only be achieved in the unrealistic situation when selection depends on a single factor. Omission of important confounding factors can explain underadjustment. Correlated misclassifications and measurement error in confounding variables may explain the observed increase in bias with logistic regression, as may differences between conditional and unconditional odds ratio estimates of treatment effects.

Conclusions

Results of non-randomised studies sometimes, but not always, differ from results of randomised studies of the same intervention. Non-randomised studies may still give seriously misleading results when treated and control groups appear similar in key prognostic factors. Standard methods of case-mix adjustment do not guarantee removal of bias. Residual confounding may be high even when good prognostic data are available, and in some situations adjusted results may appear more biased than unadjusted results.

Although many quality assessment tools exist and have been used for appraising non-randomised studies, most omit key quality domains. Six tools were considered potentially suitable for use in systematic reviews, but each requires revision to cover all relevant quality domains.

Healthcare policies based upon non-randomised studies or systematic reviews of non-randomised studies may need re-evaluation if the uncertainty in the true evidence base was not fully appreciated when policies were made.

The inability of case-mix adjustment methods to compensate for selection bias and our inability to identify non-randomised studies which are free of selection bias indicate that non-randomised studies should only be undertaken when RCTs are infeasible or unethical.

Recommendations for further research

Publication

By Deeks JJ, Dinnes J, D’Amico R, Sowden AJ, Sakarovitch C, Song F, et al. Evaluating non-randomised intervention studies. Health Technol Assess 2003;7(27).

NHS R&D HTA Programme

The NHS R&D Health Technology Assessment (HTA) Programme was set up in 1993 to ensure that high-quality research information on the costs, effectiveness and broader impact of health technologies is produced in the most efficient way for those who use, manage and provide care in the NHS.

Initially, six HTA panels (pharmaceuticals, acute sector, primary and community care, diagnostics and imaging, population screening, methodology) helped to set the research priorities for the HTA Programme. However, during the past few years there have been a number of changes in and around NHS R&D, such as the establishment of the National Institute for Clinical Excellence (NICE) and the creation of three new research programmes: Service Delivery and Organisation (SDO); New and Emerging Applications of Technology (NEAT); and the Methodology Programme.

The research reported in this monograph was identified as a priority by the HTA Programme’s Methodology Panel and was funded as project number 96/26/99.

The views expressed in this publication are those of the authors and not necessarily those of the Methodology Programme, HTA Programme or the Department of Health. The editors wish to emphasise that funding and publication of this research by the NHS should not be taken as implicit support for any recommendations made by the authors.

Criteria for inclusion in the HTA monograph series

Reports are published in the HTA monograph series if (1) they have resulted from work commissioned for the HTA Programme, and (2) they are of a sufficiently high scientific quality as assessed by the referees and editors.

Reviews in Health Technology Assessment are termed ‘systematic’ when the account of the search, appraisal and synthesis methods (to minimise biases and random errors) would, in theory, permit the replication of the review by others.

Methodology Programme Director: Professor Richard Lilford

HTA Programme Director: Professor Kent Woods

Series Editors: Professor Andrew Stevens, Dr Ken Stein, Professor John Gabbay, Dr Ruairidh Milne and Dr Rob Riemsma

Managing Editors: Sally Bailey and Sarah Llewellyn Lloyd

The editors and publisher have tried to ensure the accuracy of this report but do not accept liability for damages or losses arising from material published in this report. They would like to thank the referees for their constructive comments on the draft document.

© 2003 Crown Copyright       Top ^