March 17, 2025

Can artificial intelligence diagnose malocclusion?

Rapid developments in artificial intelligence have occurred over the past few years. This has led to several initiatives in orthodontics, including using AI to diagnose malocclusion. This may increase accuracy and reduce the need for clinician assessment. This new study examines the accuracy of AI in diagnosing features of malocclusion.

The authors of this paper highlight previous research into AI within medicine and dentistry. This demonstrates that most AI tools possess a lower diagnostic validity than assessments made by clinicians. Consequently, all AI models must undergo testing to prevent errors and potential patient harm.

A team from London, UK and Damascus, Syria did this study. The AJO-DDO published the paper.

This paper is open access so we can all read it without being a member of a specialist society.

I want to declare an interest. I have worked with Samer Mheissen, Martyn Cobourne and Farooq Ahmed.

What did they ask?

They did this study to 

“Evaluate the accuracy of the AI assessment (SmileMate) of dental and occlusal parameters using standardised clinical photography compared with a clinical assessment”?

What did they do?

They did a clinical prospective study.  This had the following stages.

  • A sample size calculation suggested they needed to evaluate records of 31 participants.
  • They collected a convenience sample of student dentists and postgraduate specialist trainees.
  • Then they took colour intraoral photographs using standardised views. These were frontal, right and left buccal, maxillary and mandibular occlusal views.
  • At the same visit a single operator did a clinical assessment using a standardised examination template.
  • Finally, they used SmileMate software to do a clinical assessment.
  • Another operator assessed 4 of the participants to measure reliability.

The team presented descriptive statistics and looked for an association between the direct clinical and AI assessments.

Then, they used the kappa statistic to evaluate the agreement between the two assessments. 

Finally, they calculated sensitivity, specificity, and receiver operating characteristic curves to determine the overall accuracy of the artificial intelligence assessment.

This was a comprehensive array of statistical tests; some readers may not be acquainted with sensitivity and specificity. I found this to be a helpful summary. I will also attempt to explain this important measurement.

Sensitivity and specificity describe the accuracy of a test that indicates the presence or absence of a condition. A positive result signifies that a person has a condition, while a negative result indicates the absence of that condition. 

Sensitivity is the true positive rate, representing the likelihood of accurately identifying the condition when it is present. Specificity, on the other hand, indicates the probability of the condition being absent when it is indeed absent.

In essence, an ideal test should demonstrate high sensitivity and specificity. I trust that I have clarified this point. Like most statistical tests, there is considerable controversy surrounding interpreting these values. Nevertheless, I have endeavoured to keep this straightforward.

What did they find?

The team presented a substantial amount of data. These were the key findings.

  • They found a statistically significant difference between the two assessments for the following domains. Maxillary crowding, overbite, oral hygiene, gingivitis, recession, decay, fractures, tooth wear and need for whitening.
  • They did not find differences for lateral open bite, peg-shaped teeth, missing teeth, retained primary teeth, overjet, maxillary and mandibular spacing, crossbite, canine classification and lateral open bite.
  • When they looked at the sensitivity and specificity data, they could only use 11 parameters. They found that the sensitivity was 72% and the specificity was 55%.

The team summed up their findings by stating.

“There was a large range of accuracy, agreement, sensitivity and specificity”.

Their conclusions were

“The overall agreement of the AI to a clinician was fair. The overall sensitivity was 72% and specificity was 54%. AI-generated assessments are inadequate for evaluating malocclusion”.

What did I think?

This was an interesting study. However, we must consider that there were some limitations. The authors acknowledged and highlighted these shortcomings. Importantly, a single operator conducted the clinical assessment, which introduces bias and limits the generalisability of the findings. Nevertheless, this paper was a great first step in research.

When I examined the data, it was encouraging that the artificial intelligence accurately identified some aspects of malocclusion. However, others, such as mandibular and maxillary crowding, exhibited 15% and 17% specificities, respectively. This indicates that only 15% of patients without crowding will be correctly assessed as free of crowding by the AI. Consequently, many patients without crowding will be incorrectly diagnosed as having it.

It is worth considering the sensitivity and specificity of other commonly used tests. For example, the use of the PSA screening test for prostate cancer. At the cut-off point of 4 ng/ml, the sensitivity is 86%, and the specificity is 33%. This low specificity means that a high proportion of men can be assessed as having prostate cancer when it is not present. As a result, the PSA is not recommended as a screening test in the UK and other countries.

What is the clinical relevance of this work?

I have thought about the clinical relevance of these findings. The artificial intelligence lacks sufficient accuracy in several key areas. This suggests that the assessments will require verification by a clinician when deciding on treatment. Therefore, the AI is unnecessary in its current form.  

Nonetheless, issues occur if artificial intelligence is misapplied in its current state. For instance, if AI-based treatment were to be offered directly to consumers without clinical examination in a direct-to-consumer model. It is feasible that ineffective or even unnecessary treatment might be suggested.

Undoubtedly, artificial intelligence for diagnosis will ultimately become an integral part of our healthcare system. However, it is also clear that this AI model requires refinement before we can routinely utilise this exciting development. Research of this kind is essential.

Related Posts

Have your say!

  1. Hello Dr. O’Brien,

    In regards to PSA and false positivity, is it not better to do further testing rather than to not diagnose or miss a case of prostate cancer (meaning something better than nothing)?

    I am also interested what definition was used to determine the three level parameters in the study (slight vs. noticeable). It seems rather subjective and more of an opinion than fact if definitions were not in place.

    From my understanding of AI in orthodontics, it is meant to be an adjunct tool for orthodontists and not a stand alone for doctors or to be considered a pathway to direct to consumer. AI plus human = better than AI or Human alone. I would like to hear different perspectives related to the subject.
    Sincerely,
    Tracy

Leave a Reply

Your email address will not be published. Required fields are marked *