CHECKMATE

2026 Genomic Diagnostics Winner

CheckMate is an AI system designed to improve rare disease diagnosis by helping clinicians determine not only which diseases are most likely, but also which symptom, exam, or test would be most useful to evaluate next. Using Human Phenotype Ontology (HPO) terms, pretrained knowledge graph embeddings, and a Partial Variational Autoencoder, it generates an uncertainty-aware disease ranking and applies information gain to identify the next step most likely to refine the diagnosis. At the same time, CheckMate incorporates equity-aware calibration and cost-sensitive recommendation scoring so that patients with sparse records or limited access to expensive testing can still receive diagnostic guidance.

PROJECT SUMMARY

Introduction

Rare diseases affect more than 300 million people worldwide, yet patients still wait an average of over five years for an accurate diagnosis. In many cases, the challenge is not simply identifying the right disease, but determining the most informative and feasible next step in the diagnostic process. Existing rare disease AI systems can rank candidate disorders from Human Phenotype Ontology (HPO) terms, but they often stop at prediction. Clinicians are still left asking what to evaluate next, whether that means a physical exam finding, laboratory study, or an advanced genetic test. This challenge is especially severe in under-resourced settings, where the most statistically useful test may be financially or logistically inaccessible.

Just over a week before the hackathon, researchers at the SJTU School of Medicine published DeepRare, a state-of-the-art multi-agent rare disease diagnosis model that predicts the probabilities a patient has different rare diseases. We built CheckMate to ingest these probabilities and recommend the doctor what steps to take next. CheckMate uses a Partial Variational Autoencoder (Partial VAE) to produce an uncertainty-aware disease distribution. From there, it evaluates which unobserved phenotype would be most informative to assess next, using an information gain framework inspired by EDDI (Efficient Dynamic Discovery of high-value Information), which scores each unobserved phenotype by how much it would narrow the differential if checked. To ensure that sparse documentation does not translate into worse recommendations, CheckMate incorporates group-conditional conformal prediction and cost-aware recommendation adjustment. The result is a diagnostic assistant that not only predicts what disease a patient may have, but also recommends the best next move in a way that is clinically actionable.

Approach

CheckMate is built as a three-part rare disease decision engine. First, it generates a calibrated differential diagnosis over 12,971 rare disease classes. We begin from DeepRare’s structured output and pass the patient’s phenotype representation into a Partial VAE trained independently on 226,000 synthetic patients derived from the Human Phenotype Ontology Annotation database (HPOA). The VAE uses frozen pretrained SHEPHERD knowledge graph embeddings as biologically informed priors, allowing the model to reason effectively even when patient records are sparse. Instead of forcing a single predicted label, the model represents the case as a latent probability distribution, producing an uncertainty-aware shortlist of candidate diseases. Second, CheckMate identifies the most informative next phenotype to evaluate. The VAE jointly predicts both disease probabilities and probabilities over unobserved HPO terms. For each candidate phenotype, the system simulates two futures: one in which the phenotype is present and one in which it is absent. It then measures how much each outcome would shift the disease posterior using KL divergence and weights these shifts by the predicted probability of each outcome. This yields an expected information gain score for every unobserved phenotype, allowing CheckMate to recommend the next highest-yield clinical test. Third, CheckMate embeds equity directly into the recommendation loop. When records are sparse, raw information gain alone may favor expensive or inaccessible tests. To address this, CheckMate divides each candidate’s information gain by a clinical cost tier, so low-cost observations such as history questions or physical exam findings can outrank high-cost studies when documentation is limited. In parallel, the system applies group-conditional conformal calibration across three documentation-depth strata (sparse, moderate, and well-documented) to guarantee at least 90% diagnostic coverage within each subgroup. In our setting, standard conformal methods yielded only about 72% coverage for sparse patients against a 90% target, whereas group-conditional calibration reduced this disparity to approximately 1 percentage point. Rather than treating fairness as an afterthought, CheckMate uses model uncertainty itself as a signal to drive more accessible diagnostic recommendations.

Implementation and Results

We implemented CheckMate using frozen 64-dimensional SHEPHERD embeddings, a permutation-invariant Partial VAE, and synthetic patient training data generated from HPOA disease=phenotype frequency tables. During training, we applied 30-70% random masking to simulate incomplete clinical records and force the model to reason under missingness. The architecture decodes into both a disease probability distribution and a phenotype acquisition distribution, enabling simultaneous diagnostic ranking and next-step recommendation. Training was run on Brown University’s Oscar HPC cluster across five SLURM jobs, including dedicated embedding extraction and batched information gain computation. The final model was trained for 45 epochs on an NVIDIA RTX A5500 and achieved 64.2% top-1 disease accuracy and 82.5% top-5 recall across 12,971 disease classes. After temperature scaling by a scalar of 1.39, the model achieved an Expected Calibration Error of 0.021, demonstrating strong calibration for uncertainty-sensitive clinical use.

Together, these results support CheckMate’s central contribution: reframing rare disease diagnosis as an equity-aware decision process rather than a static ranking task. Instead of only producing a differential, the system helps clinicians decide what to evaluate next, while adapting its recommendations when documentation is sparse and ensuring that uncertainty is honestly represented in the diagnostic shortlist. This makes CheckMate particularly well suited for real-world clinical settings, where limited resources and incomplete records often make the next step just as important as the prediction itself.

Future Developments

The next stage of CheckMate is real-world validation and clinical integration. Our top priority is testing the acquisition loop on real patient cases to measure how many steps are required to reach a correct diagnosis, the total diagnostic cost per case, and whether performance remains equitable across patient groups with different levels of documentation. We also plan to externally validate our conformal calibration thresholds on RareBench-style benchmark datasets to ensure that our coverage guarantees remain robust outside synthetic training data. We also aim to directly evaluate whether predicted information gain aligns with true posterior shifts once phenotype outcomes are observed in practice. This will let us assess not just whether CheckMate ranks diseases accurately, but whether it truly recommends the right next move. Finally, we plan to map phenotype recommendations to standardized procedure codes and integrate the system with EHR pipelines, enabling CheckMate to operate on real patient records and fit naturally into clinical workflows. Together, these next steps would move CheckMate from a promising proof of concept toward a deployable clinical decision-support system for equitable rare disease diagnosis.

Citations

Alsentzer E, Li MM et al. Few-shot learning for phenotype-driven diagnosis of patients with rare genetic diseases. npj Digital Medicine 8:380, 2025. doi:10.1038/s41746-025-01749-1
Ma C et al. EDDI: Efficient Dynamic Discovery of High-Value Information with Partial VAE. ICML 2019, PMLR 97:4234–4243.
Angelopoulos AN, Bates S. A Gentle Introduction to Conformal Prediction and Distribution-Free Uncertainty Quantification. arXiv:2107.07511, 2021.
Köhler S et al. The Human Phenotype Ontology in 2021. Nucleic Acids Research, 49(D1):D1207–D1217, 2021. doi:10.1093/nar/gkaa1043
Faye E et al. Time to diagnosis and determinants of diagnostic delays of people living with a rare disease.
Eur J Hum Genet 32(9):1116–1126, 2024. doi:10.1038/s41431-024-01604-z 7.6yr US figure: Global Genes / NORD RARE Insights Survey, 2013 (n=631, UK & US families).
Kingma DP, Welling M. Auto-Encoding Variational Bayes. ICLR 2014. arXiv:1312.6114. Tang Z et al. An agentic system for rare disease diagnosis with traceable reasoning. Nature, 2025. doi:10.1038/s41586-025-10097-9 Recall@1 57.18% on HPO-based benchmarks (baseline for CheckMate comparison).
Chen X et al. RareBench: Can LLMs Serve as Rare Diseases Specialists? KDD 2024, pp. 4850–4861. doi:10.1145/3637528.3671576 Open-source benchmark; RAMEDIS, MME, HMS, LIRICAL, PUMCH_ADM datasets.

MEET THE TEAM

Sanil Desai
Brown University
Undergraduate (2028)
Computer Science and Math

Henry Greenhut
Brown University
Undergraduate (2028)
Computer Science and Physics

Akshay Vakharia
Brown University
Undergraduate (2028)
Computational Biology