Real cases where AI caught what doctors missed — and the uncomfortable questions that raises about how we practice medicine.
The Scan That Changed Everything
In Sweden, between 2021 and 2022, more than 100,000 women underwent routine mammography screening as part of the MASAI trial — the first randomized controlled trial of AI-assisted breast cancer detection ever conducted. Half the women had their scans read by radiologists alone. The other half had an AI system review their mammograms first, flagging suspicious areas for human review.
The results, published in The Lancet Digital Health in 2024 and later in The Lancet itself in 2025, were striking. The AI-supported group caught 81 percent of cancers at screening, compared to 74 percent in the standard group. More importantly, AI-assisted screening led to 16 percent fewer invasive interval cancers — the aggressive tumors that show up between scheduled screenings, the ones most likely to kill. There were also 21 percent fewer large tumors and 27 percent fewer of the most aggressive non-luminal A subtypes in the AI group.
The radiologists did not work harder. They worked less. AI-assisted reading cut their workload by 44 percent, because the algorithm triaged cases — routing clearly normal scans away from human review and concentrating radiologist attention on the cases that actually needed expert eyes. The false positive rate barely changed: 1.5 percent in the AI group versus 1.4 percent in the control group.
This was not a tech demo. It was a population-level study with real patients, real cancers, and a real reduction in the kinds of tumors that are hardest to treat. The AI was not replacing the radiologists. It was showing them where to look.
But the story of AI in medicine is not a simple tale of machines saving lives. It is more complicated than that, and the complications matter as much as the breakthroughs.
What AI Actually Sees That Doctors Miss
The advantage AI holds over human clinicians is not intelligence. It is consistency. A radiologist reading mammograms at 4 p.m. on a Friday, after reviewing 80 other scans, does not perform the same as that same radiologist at 9 a.m. on a Monday. An algorithm does. It never gets tired, never gets distracted, and never unconsciously anchors on an earlier diagnosis. It processes every pixel of every image with the same mathematical rigor, every time.
In September 2024, Harvard Medical School researchers published a study in Nature introducing CHIEF (Clinical Histopathology Imaging Evaluation Foundation), an AI model trained on 15 million pathology images and refined on 60,000 whole-slide tissue samples. CHIEF achieved 94 percent accuracy in cancer detection across 19 cancer types — lung, breast, prostate, colorectal, stomach, liver, pancreatic, and twelve others. It outperformed existing AI approaches by up to 36 percent on specific tasks.
What made CHIEF remarkable was not just its accuracy but its versatility. The model identified gene mutations directly from tissue slides — achieving 96 percent accuracy for EZH2 mutations in lymphoma, 89 percent for BRAF mutations in thyroid cancer, and 91 percent for NTRK1 mutations in head and neck cancers. These are determinations that traditionally require separate, expensive molecular testing. CHIEF collapsed multiple diagnostic steps into one.
The model was validated across 32 independent datasets from 24 hospitals worldwide. That breadth matters enormously. Many AI systems perform brilliantly on the data they were trained on and fall apart when deployed at a different hospital with different equipment, different patient demographics, and different tissue preparation methods. CHIEF held up across all of them.
Beyond imaging, AI is making inroads into bedside care. At UC San Diego Medical Center, an AI algorithm called COMPOSER continuously monitors emergency department patients for early signs of sepsis — a condition that kills nearly 350,000 Americans annually. A January 2024 study in npj Digital Medicine found that COMPOSER reduced sepsis mortality by 17 percent across more than 6,000 patient admissions. The system flags deterioration hours before traditional monitoring catches the change, and it does so silently, running in the background without disrupting clinical workflow.
In April 2024, the FDA authorized the Sepsis ImmunoScore — the first AI diagnostic tool specifically for sepsis — through its De Novo regulatory pathway. A study published in NEJM AI showed it stratified patients into risk categories that predicted in-hospital mortality with uncomfortable precision: 0 percent for low risk, 1.9 percent for medium, 8.7 percent for high, and 18.2 percent for very high risk. That kind of granularity gives emergency physicians something they rarely have — time to intervene before a patient crashes.
The FDA has now authorized over 1,300 AI-enabled medical devices, with roughly 80 percent targeting radiology and medical imaging. In 2025 alone, radiology accounted for more than 1,039 approved tools. The pace is accelerating: user adoption among radiologists roughly doubled between 2018 and 2024, from 20 percent to 48 percent outside the EU.
The Uncomfortable Deskilling Problem
Here is where the narrative gets complicated. In August 2025, The Lancet Gastroenterology and Hepatology published a study that should give every healthcare administrator pause.
Researchers at four endoscopy centers in Poland tracked 19 experienced endoscopists performing colonoscopies before and after AI-assisted detection was introduced into their workflow. Before AI arrived, these doctors detected precancerous growths — adenomas — at a rate of 28.4 percent during standard procedures. After several months of working with AI assistance, their detection rate when working without AI dropped to 22.4 percent. That is a 20 percent decline in unaided performance.
The implications are unsettling. The doctors had not forgotten their training. They had become dependent on the AI in ways they probably did not notice. When the system flagged suspicious polyps for months on end, the clinicians gradually stopped looking as hard on their own. Their pattern recognition atrophied, the way a muscle does when you stop using it.
This is not an abstract concern. Hospitals lose power. Software crashes. Systems get updated and go offline for hours or days. Internet connections fail. If the doctors behind these systems have lost the ability to perform without them, patients face real danger during every moment of downtime. And in rural or under-resourced hospitals, downtime is not the exception — it is the norm.
A meta-analysis published in early 2025, examining 83 studies and 4,762 cases across multiple specialties, found that physicians still outperformed AI models by an average of 14.4 percentage points in overall diagnostic accuracy. Expert physicians performed even better. The gap between AI and non-expert physicians, however, was statistically insignificant — which suggests that AI’s greatest value may be in elevating the performance of less experienced clinicians rather than replacing experts.
The takeaway is nuanced but important: AI makes average doctors better, but it may also make good doctors complacent. The technology works best as a tool that supplements human skill, not one that substitutes for it. Training programs may need to build in regular “AI-off” practice sessions, the same way pilots train for engine failures even though engines almost never fail.
Drug Discovery: From Decades to Months
If AI’s role in diagnosis is complicated, its role in drug discovery is more straightforwardly promising — and the numbers are starting to prove it.
In October 2024, Google DeepMind’s Demis Hassabis and John Jumper shared the Nobel Prize in Chemistry for AlphaFold, the AI system that predicted the 3D structure of virtually every known protein. Before AlphaFold, determining a single protein structure could take a graduate student months or years of painstaking laboratory work using X-ray crystallography or cryo-electron microscopy. AlphaFold mapped over 200 million protein structures and has been used by more than 2 million researchers in 190 countries, including over 1 million users in low- and middle-income countries. That is not incremental progress. It is a phase change in how drug targets are identified.
But predicting protein structures is only the beginning. The harder question — the one that matters to patients — is whether AI can actually design drugs that work in human bodies. In June 2025, Nature Medicine published the most compelling answer yet.
Insilico Medicine’s rentosertib became the first AI-designed drug to complete a Phase 2a clinical trial with positive results. The GENESIS-IPF trial was a double-blind, placebo-controlled study that enrolled 71 patients with idiopathic pulmonary fibrosis — a devastating lung disease with few treatment options — across 22 sites in China. Patients receiving the highest dose (60 mg once daily) showed a mean improvement in lung function of 98.4 mL, compared to a decline of 20.3 mL in the placebo group. Biomarker analysis showed dose-dependent reductions in profibrotic proteins like COL1A1 and FAP, along with increases in the anti-inflammatory marker IL-10.
What makes this trial significant is not just the positive result. It is the fact that both the drug target (TNIK) and the molecule itself were identified using Insilico’s generative AI platform, Pharma.AI. This was a genuine end-to-end demonstration: AI chose what to attack, designed the weapon, and the weapon worked.
| AI Drug Discovery Milestone | Year | Significance |
|---|---|---|
| AlphaFold predicts 200M+ protein structures | 2020-2024 | Nobel Prize in Chemistry (2024) |
| FDA authorizes 1,300+ AI medical devices | 2015-2025 | 80% focused on medical imaging |
| AI drug candidates: 90% Phase I success rate | 2024 | vs. 40-65% industry average |
| Rentosertib Phase 2a positive results | 2025 | First AI-designed drug to show efficacy |
| Zasocitinib enters Phase III trials | 2025 | AI-designed TYK2 inhibitor (Nimbus/Takeda) |
A 2024 industry analysis found that AI-assisted drug candidates achieved Phase I clinical trial success rates of nearly 90 percent, compared to the industry average of 40 to 65 percent. That gap suggests AI is not just speeding up the process — it is selecting better candidates from the start, weeding out molecules that would fail before they ever reach a human subject. Traditional drug development is a decade-long process of expensive failures. AI appears to be compressing that failure into the computational phase, where it costs microseconds instead of millions.
Zasocitinib, a TYK2 inhibitor originally designed using computational physics by Schrodinger and developed by Nimbus Therapeutics (later acquired by Takeda for $6 billion), advanced into Phase III clinical trials in 2025. If it succeeds, it will be another proof point that AI-informed drug design produces compounds that survive the brutal attrition of clinical testing — where historically, 90 percent of drug candidates fail.
The clinical pipeline tells its own story. AI-related clinical trial activity has grown by 444 percent since 2019, with a compound annual growth rate of 40 percent. AI drug discovery has tracked almost identically, at 421 percent growth. These are no longer research curiosities. They are an emerging industrial process.
Frequently Asked Questions
Yes, in documented, peer-reviewed cases. The MASAI trial in Sweden showed AI-assisted mammography detected 81 percent of breast cancers at screening versus 74 percent with standard radiologist reading, catching tumors that experienced radiologists alone missed. It also reduced the number of aggressive interval cancers by 16 percent. Separately, Harvard’s CHIEF model identified cancer across 19 tumor types with 94 percent accuracy when validated across 24 hospitals on 32 independent datasets. These are not anecdotes — they are results from studies published in Nature and The Lancet.
Not yet, and probably not for a long time. A 2025 meta-analysis of 83 studies found that physicians still outperform AI by an average of 14.4 percentage points in diagnostic accuracy. Expert physicians do even better. AI’s current strength is as a second pair of eyes — catching things humans miss due to fatigue or volume, and elevating the performance of less experienced clinicians. The 2025 Lancet colonoscopy study also shows that over-reliance on AI can erode doctors’ own detection skills by 20 percent, which argues strongly for AI as a supplement rather than a replacement.
The most advanced candidate is Insilico Medicine’s rentosertib, which completed a positive Phase 2a trial for idiopathic pulmonary fibrosis in 2025, with results published in Nature Medicine. Zasocitinib, designed with AI-assisted computational methods, is already in Phase III trials. If current timelines hold, the first fully AI-designed drug could reach market approval between 2027 and 2028. The pipeline is encouraging: AI-assisted drug candidates have shown Phase I success rates near 90 percent, roughly double the traditional industry average.