The Manrai Lab is a team of machine learning scientists, clinicians, and biomedical data scientists working to improve medical decision making by developing computational approaches that incorporate rich and deep representations of clinical state and an individual's identity into care. Active projects include:
Improving genetic variant classification and quantifying risk ("penetrance") in clinical genomics, with a focus on inherited heart disease (e.g. Manrai et al. NEJM 2016)
Measuring "normal" variation for blood laboratory biomarkers across populations with a focus on creatinine and kidney disease (e.g. Manrai et al. JAMA 2018)/li>
Developing semi-supervised learning approaches with applications including medical imaging and text (e.g. Melas-Kyriazi & Manrai 2020)/li>
Modeling reproducibility in integrative biomedical studies using meta-science ("science of science") approaches (e.g. Manrai et al. AJE 2019)/li>
The group's research has been published in the New England Journal of Medicine and JAMA, presented at the National Academy of Sciences, and featured in the New York Times, Wall Street Journal, and NPR.
Research Background
Arjun (Raj) Manrai is an Assistant Professor at Harvard Medical School and Faculty Member in the Computational Health Informatics Program (CHIP) at Boston Children’s Hospital. Manrai received an A.B. in Physics with Highest Honors from Harvard and earned his Ph.D. in Bioinformatics and Integrative Genomics from the Harvard-MIT Division of Health Sciences and Technology.
Publications
Tackling algorithmic bias and promoting transparency in health datasets: the STANDING Together consensus recommendations. Lancet Digit Health. 2025 Jan; 7(1):e64-e88. View Abstract
Large Language Models and the Degradation of the Medical Record. N Engl J Med. 2024 Oct 31; 391(17):1561-1564. View Abstract
Medical Artificial Intelligence and Human Values. Reply. N Engl J Med. 2024 Sep 26; 391(12):1167-1168. View Abstract
Projected Changes in Statin and Antihypertensive Therapy Eligibility With the AHA PREVENT Cardiovascular Risk Equations. JAMA. 2024 09 24; 332(12):989-1000. View Abstract
Medical Artificial Intelligence and Human Values. N Engl J Med. 2024 May 30; 390(20):1895-1904. View Abstract
Discordance between a deep learning model and clinical-grade variant pathogenicity classification in a rare disease cohort. medRxiv. 2024 May 23. View Abstract
Implications of Race Adjustment in Lung-Function Equations. N Engl J Med. 2024 Jun 13; 390(22):2083-2097. View Abstract
Heterogeneity in elevated glucose and A1C as predictors of the prediabetes to diabetes transition: Framingham Heart Study, Multi-Ethnic Study on Atherosclerosis, Jackson Heart Study, and Atherosclerosis Risk In Communities. medRxiv. 2024 Apr 08. View Abstract
To do no harm - and the most good - with AI in health care. Nat Med. 2024 Mar; 30(3):623-627. View Abstract
Assessing the genetic contribution of cumulative behavioral factors associated with longitudinal type 2 diabetes risk highlights adiposity and the brain-metabolic axis. medRxiv. 2024 Jan 31. View Abstract
Decoding the exposome: data science methodologies and implications in exposome-wide association studies (ExWASs). Exposome. 2024; 4(1):osae001. View Abstract
Prediction and stratification of longitudinal risk for chronic obstructive pulmonary disease across smoking behaviors. Nat Commun. 2023 Dec 14; 14(1):8297. View Abstract
Artificial Intelligence vs Clinician Performance in Estimating Probabilities of Diagnoses Before and After Testing. JAMA Netw Open. 2023 12 01; 6(12):e2347075. View Abstract
Publisher Correction: Scientific discovery in the age of artificial intelligence. Nature. 2023 Sep; 621(7978):E33. View Abstract
Scientific discovery in the age of artificial intelligence. Nature. 2023 Aug; 620(7972):47-60. View Abstract
Prediction and stratification of longitudinal risk for chronic obstructive pulmonary disease across smoking behaviors. medRxiv. 2023 Apr 05. View Abstract
Artificial Intelligence in Medicine. N Engl J Med. 2023 Mar 30; 388(13):1220-1221. View Abstract
National Projections for Clinical Implications of Race-Free Creatinine-Based GFR Estimating Equations. J Am Soc Nephrol. 2023 02 01; 34(2):309-321. View Abstract
Positive Predictive Value of the Thumb-Palm Test for General Population Screening of Ascending Aortic Aneurysm. Am J Cardiol. 2021 12 15; 161:116-117. View Abstract
Leveraging vibration of effects analysis for robust discovery in observational biomedical data science. PLoS Biol. 2021 09; 19(9):e3001398. View Abstract
Data Mining Approaches to Reference Interval Studies. Clin Chem. 2021 09 01; 67(9):1175-1181. View Abstract
Foundational Considerations for Artificial Intelligence Using Ophthalmic Images. Ophthalmology. 2022 02; 129(2):e14-e32. View Abstract
Race-Free Equations for eGFR: Comparing Effects on CKD Classification. J Am Soc Nephrol. 2021 08; 32(8):1868-1870. View Abstract
Physicians, Probabilities, and Populations-Estimating the Likelihood of Disease for Common Clinical Scenarios. JAMA Intern Med. 2021 06 01; 181(6):756-757. View Abstract
Removing Race From Kidney Function Estimates-Reply. JAMA. 2021 05 18; 325(19):2018-2019. View Abstract
Association of 152 Biomarker Reference Intervals with All-Cause Mortality in Participants of a General United States Survey from 1999 to 2010. Clin Chem. 2021 03 01; 67(3):500-507. View Abstract
Harmonizing the Collection of Clinical Data on Genetic Testing Requisition Forms to Enhance Variant Interpretation in Hypertrophic Cardiomyopathy (HCM): A Study from the ClinGen Cardiomyopathy Variant Curation Expert Panel. J Mol Diagn. 2021 05; 23(5):589-598. View Abstract
Comparisons of Polyexposure, Polygenic, and Clinical Risk Scores in Risk Prediction of Type 2 Diabetes. Diabetes Care. 2021 04; 44(4):935-943. View Abstract
Clinical Implications of Removing Race From Estimates of Kidney Function. JAMA. 2021 Jan 12; 325(2):184-186. View Abstract
In Search of a Better Equation - Performance and Equity in Estimates of Kidney Function. N Engl J Med. 2021 Feb 04; 384(5):396-399. View Abstract
What about the environment? Leveraging multi-omic datasets to characterize the environment's role in human health. Pac Symp Biocomput. 2021; 26:309-315. View Abstract
What about the environment? Leveraging multi-omic datasets to characterize the environment's role in human health. Pac Symp Biocomput. 2021; 26:309-315. View Abstract
Scalability and cost-effectiveness analysis of whole genome-wide association studies on Google Cloud Platform and Amazon Web Services. J Am Med Inform Assoc. 2020 09 01; 27(9):1425-1430. View Abstract
Prediction of chronological and biological age from laboratory data. Aging (Albany NY). 2020 05 05; 12(9):7626-7638. View Abstract
Challenges to the Reproducibility of Machine Learning Models in Health Care. JAMA. 2020 01 28; 323(4):305-306. View Abstract
Signals Among Signals: Prioritizing Nongenetic Associations in Massive Data Sets. Am J Epidemiol. 2019 05 01; 188(5):846-850. View Abstract
Author Correction: Repurposing large health insurance claims data to estimate genetic and environmental contributions in 560 phenotypes. Nat Genet. 2019 04; 51(4):764-765. View Abstract
Potential Excessive Testing at Scale: Biomarkers, Genomics, and Machine Learning. JAMA. 2019 Feb 26; 321(8):739-740. View Abstract
Repurposing large health insurance claims data to estimate genetic and environmental contributions in 560 phenotypes. Nat Genet. 2019 02; 51(2):327-334. View Abstract
Using Big Data to Determine Reference Values for Laboratory Tests-Reply. JAMA. 2018 10 09; 320(14):1496. View Abstract
In the Era of Precision Medicine and Big Data, Who Is Normal? JAMA. 2018 May 15; 319(19):1981-1982. View Abstract
Biomedical informatics and machine learning for clinical genomics. Hum Mol Genet. 2018 05 01; 27(R1):R29-R34. View Abstract
Adaptation and validation of the ACMG/AMP variant classification framework for MYH7-associated inherited cardiomyopathies: recommendations by ClinGen's Inherited Cardiomyopathy Expert Panel. Genet Med. 2018 03; 20(3):351-359. View Abstract
Association of Sex With Recurrence of Autism Spectrum Disorder Among Siblings. JAMA Pediatr. 2017 11 01; 171(11):1107-1112. View Abstract
Systematic correlation of environmental exposure and physiological and self-reported behaviour factors with leukocyte telomere length. Int J Epidemiol. 2017 02 01; 46(1):44-56. View Abstract
METHODS TO ENSURE THE REPRODUCIBILITY OF BIOMEDICAL RESEARCH. Pac Symp Biocomput. 2017; 22:117-119. View Abstract
Informatics and Data Analytics to Support Exposome-Based Discovery for Public Health. Annu Rev Public Health. 2017 Mar 20; 38:279-294. View Abstract
Genetic Misdiagnoses and the Potential for Health Disparities. N Engl J Med. 2016 Aug 18; 375(7):655-65. View Abstract
Clinical Genomics: From Pathogenicity Claims to Quantitative Risk Estimates. JAMA. 2016 Mar 22-29; 315(12):1233-4. View Abstract
REPRODUCIBLE AND SHAREABLE QUANTIFICATIONS OF PATHOGENICITY. Pac Symp Biocomput. 2016; 21:231-42. View Abstract
METHODS TO ENHANCE THE REPRODUCIBILITY OF PRECISION MEDICINE. Pac Symp Biocomput. 2016; 21:180-182. View Abstract
Development of exposome correlation globes to map out environment-wide associations. Pac Symp Biocomput. 2015; 231-42. View Abstract
Enriched protein screening of human bone marrow mesenchymal stromal cell secretions reveals MFAP5 and PENK as novel IL-10 modulators. Mol Ther. 2014 May; 22(5):999-1007. View Abstract
Urinary-cell mRNA and acute kidney-transplant rejection. N Engl J Med. 2013 11 07; 369(19):1859. View Abstract
CEAS: cis-regulatory element annotation system. Bioinformatics. 2009 Oct 01; 25(19):2605-6. View Abstract
Androgen receptor regulates a distinct transcription program in androgen-independent prostate cancer. Cell. 2009 Jul 23; 138(2):245-56. View Abstract
The geometry of multisite phosphorylation. Biophys J. 2008 Dec 15; 95(12):5533-43. View Abstract