In the rush to embrace generative AI, an older form of artificial intelligence is proving it still has a critical edge—especially when it comes to diagnosing disease.
Key Points at a Glance
- Traditional diagnostic AI system DXplain outperformed ChatGPT and Gemini in clinical diagnosis
- DXplain correctly identified diagnoses 72% of the time with lab data, vs. 64% and 58% for LLMs
- Without lab data, the expert system maintained its lead
- Researchers suggest combining expert systems with generative AI could yield even better results
In an age dominated by generative AI tools like ChatGPT and Gemini, it might be surprising to learn that a 40-year-old diagnostic system is still holding its ground—and even outperforming its flashier successors. Researchers at Massachusetts General Hospital (MGH), a part of Mass General Brigham, have shown that DXplain, a traditional diagnostic decision support system (DDSS) developed in 1984, continues to excel where it matters most: helping doctors diagnose disease.
Their study, published in JAMA Network Open, directly compared DXplain with leading large language models (LLMs) like ChatGPT and Gemini. Across 36 diverse patient cases—with and without lab data—DXplain consistently edged out the LLMs, correctly diagnosing 72% of cases when lab data was included, versus 64% for ChatGPT and 58% for Gemini. Without lab data, DXplain still led the pack at 56% accuracy.
But the findings weren’t just about competition—they revealed synergy. Each system picked up on conditions the others missed, hinting at a powerful future collaboration between rule-based and language-based AI. “Amid all the interest in large language models, it’s easy to forget that the first AI systems used successfully in medicine were expert systems like DXplain,” said Dr. Edward Hoffer, a co-author of the study.
While LLMs excel at language interpretation, summarizing medical texts, and natural conversation, DDSSs like DXplain rely on structured databases of symptoms, diseases, and clinical rules. This gives them an advantage in diagnostic accuracy, especially when lab data is sparse or cases are atypical.
The Mass General Brigham team sees potential in combining these strengths. LLMs could scan and extract data from unstructured physician notes or electronic health records, feeding it into a system like DXplain to generate more reliable diagnoses. Such a hybrid approach could elevate decision-making and reduce diagnostic errors—especially in high-pressure clinical settings.
“These systems can recall rare diseases or obscure symptom combinations physicians might overlook, particularly under stress,” noted lead author Dr. Mitchell Feldman. “Pairing that with the flexible interface of LLMs could lead to faster, more accurate care.”
The researchers also emphasized that this isn’t about replacing doctors—but giving them sharper tools. Expert systems offer the memory and logic; LLMs provide the narrative fluency. Together, they could represent the next evolution in AI-assisted medicine—one that brings the best of both worlds to the bedside.
Source: Mass General Brigham