An occasionally irregular blog about orthodontics

Historical controls are not valid for orthodontic research?

Historical controls are not valid for orthodontic research?

Historical controls are not valid for orthodontic research

This post is about the use of historical controls in orthodontic research and is based on a new paper in the EJO. To some readers this may appear to be a little dry, but I think that this paper is very important as it provides information that may cast doubt on the conclusions of a large amount of published research.

In the past a number of orthodontic studies have been carried by comparing the results of a treatment against an untreated historical control. This has been done when it has not been possible to have an untreated control or it is too costly or difficult to run a randomised trial. When this research has been evaluated questions have been raised about the validity of the historical control, particularly with respect to the selection of the control subjects and whether the historical group are different from contemporary children. I have posted about this before. This paper gives us further information on this question.

Screen Shot 2016-05-15 at 18.46.26Bias from historical control groups used in orthodontic research: A meta-epidemiological study

Papageorgiou et al..

European Journal of Orthodontics 2016, Advance access

DOI: 1093/ejo/cjw035




I thought this was an interesting paper that provided very useful information. In the introduction, the authors wrote a nice outline of the use of historical data. They explained that this data was derived from universities, individual practices and large-scale longitudinal growth studies. Importantly,  historical data may have temporal bias because the historical control subjects may not be similar to our current patients. they also pointed out that as historical data was frequently confined to cephalometric measures, the outcome measures were not always relevant to patient values.

What did they do?

The aim of their study was to attempt to identify differences between the results of trials with concurrent and historical control groups. They also wanted to find out if there were any differences between studies with different types of historical control.

They did this by analysing data from previous systematic reviews. (A systematic review of systematic reviews!).

They used  standard methodology to identify the studies and collect data. They then analysed this data with relevant and rather complex statistics to compare the following;

  1. Concurrent (contemporary) and historical control groups
  2. Historical control groups based on growth studies or clinical archived records.

What did they find? 

They initially found 294 systematic reviews and after filtering they ended up with 14 reviews. They extracted the data from these reviews and analysed 65 meta-analyses from 122 trials. All the studies used cephalometric outcome measures. The studies could be divided into:

Trialsn (%)
Randomised clinical trial31 (25%)
Prospective controlled trial40 (33%)
Retrospective controlled trial48 (39%)
Unclear3 (3%)

They calculated the Standard Mean Difference (SMD) for the cephalometric measurements. Their data analysis revealed that trials with historical controls showed smaller treatment effects than trials with concurrent controls SMD =  -0.31  (CI  -0.53, -0.10)

When they looked a different types of historical control they found that trials with historical controls based on growth studies showed larger treatment effects compared to trials with clinical archived records  SMD = 0.40  (CI 0.21, 0.59).

They suggested that there was bias arising directly from the use and choice of the historical control group and that treatment effects were exaggerated. This reinforced recent evidence that none randomised controlled trials are associated with excess statistical significance. Furthermore, secular trends have been reported to exist in the widely used orthodontic growth studies. Their final conclusion was that historical control groups should be avoided.

They set out some really nice clear conclusions. I felt that the most important were;

  • For clinical questions studies should include concurrent untreated controls
  • When untreated controls are not available, or possible, a control group receiving standard therapy should be used (treatment as usual).
  • Studies with retrospective designs and historical control groups should be viewed with caution.

What did I think?

I thought that this was a very complex paper that provided us with very useful information. This paper is likely to become very important because it casts doubt on the findings of many previous trial that have used historical controls and systematic reviews that have included this type of study.

As these findings are potentially important, I have looked at this paper very critically and I will point out three important issues.

  1. We are not certain that they identified all the studies that were relevant to the question. Nevertheless, using systematic reviews as a source of this data would have reduced selection bias and is an efficient method to adopt.
  2. They calculated the standardised mean differences from cephalometric values and these were rather small. But we also need to remember that many of these papers report very small differences in cephalometric measurement. As a result, the standard mean differences are similar to the reported treatment effects.
  3. I was surprised that trials with historical controls showed smaller treatment effects compared to trials with concurrent controls.  I assumed that this would be the other way round and that the differences would be greater for historical controlled trials.  This is also supported by the other literature that the authors quoted and they stated that “the probability of a treatment to be found effective by a trial with a historical control was increased by 293-383% compared to RCTS”. The authors did not comment on this in their discussion and perhaps they could make a comment on this blog?

I think that these issues should not detract from their findings and I would like to suggest

This paper adds to the evidence against the validity of the retrospective record “trawl” and comparison with a historical record.

I hope that journal reviewers and editors consider these findings when they evaluate future studies.

Tags: , , , ,

There is 1 Comment

Trackback URL | Comments RSS Feed

  1. Spyros Papageorgiou says:

    Hi Kevin and thanks for the nice blog on our paper.

    I will try to give a some more information on this point.

    As far as the direction of the effect magnitude is concerned we did indeed found that systematic differences exist between the results of studies with concurrent controls and studies with historical controls. As this difference was consistent in all identified reviews and was not due to chance, this was interpreted as signs of bias.

    Now, as far as our analysis is concerned, we modeled the influence of the intervention group’s nature (prospective or retrospective) independently from the influence of the control group’s nature (concurrent or historical), and did a multivariable analysis within each included meta-analysis to combine them. This enabled us to quantify the “pure” effect of the control group, while accounting for the confounding effect of the intervention group, and vice versa.

    This means that the SMD=-0.31 that was found for concurrent vs historical controls should be explained only from the nature of the control group, and not from the nature of the intervention group. However, the SMD is calculated in each meta-analysis as (Treatment value-Control value). Therefore, it is logical to assume that if the SMD of studies with historical controls is smaller than that of studies with concurrent controls, and this can be explained only from the control group, then, studies with historical control groups overestimate the data of the untreated control patients. That is “retrospective” control groups, exaggerate the results of the control group. This seems to logically agree with the notion that retrospective studies in orthodontics exaggerate the results of treatment (which we found in our earlier paper “Papageorgiou SN, Xavier GM, Cobourne MT. Basic study design influences the results of orthodontic clinical investigations. J Clin Epidemiol. 2015 Dec;68(12):1512-22” and confirmed in this one.

    As far as the comparison of our results with that of Sacks and colleagues (Sacks, H., Chalmers, T.C. and Smith, H., Jr. (1982) Randomized versus historical controls for clinical trials. American Journal of Medicine, 72, 233–240), the two studies broadly agree on the notion that historical controls are associated with biased results. However, the two studies are not directly comparable, as the in the study of Sacks and colleagues, treatment effects were judged on the yes/no basis of statistical significance (and not effect magnitude or direction), while no separate analysis for intervention and control groups was made.

    Finally, on a different note. Both you, but also the journal’s reviewers mentioned that the effect of SMD=-0.31 is somewhat small. However, I think this should not be that lightly taken. Firstly, we are somewhat uncertain about the magnitude of bias, as the predictive intervals showed that this could range between SMDs of 0.07 to 0.57. However, (and most importantly) we have to understand that this is but a single methodological characteristic of clinical trials that could introduce bias. There are several other design characteristics like, randomization, “prospectiveness” (or “retrospectiveness”) of the intervention group, blinding, etc that could co-exist with historical control groups and therefore add supplementary.

    Therefore, our aim, both when designing and when appraising a clinical study, would be to provide data that are as little biased as possible—which means eliminating all factors that could introduce bias.


Post a Comment

Your email address will not be published. Required fields are marked *


Pin It on Pinterest

Share This