Guergana Savova | Boston Children's Research

Information

Related Research Units

Research Overview

Dr. Guergana Savova's research interests are in natural language processing (NLP) especially as applied to the text generated by physicians (the clinical narrative). This is usually referred to as clinical NLP. She has been creating gold standard annotated resources based on computable definitions and developing methods for computable solutions. The focus of Dr. Savova's research is higher level semantic and discourse processing of the clinical narrative which includes tasks such as named entity recognition, event recognition, relation detection and classification including coreference and temporal relations. The methods are mostly machine learning spanning supervised, lightly supervised and completely unsupervised.

The result of Dr. Savova's research with her collaborators has led to the creation of the clinical Text Analysis and Knowledge Extraction System (cTAKES; http://sourceforge.net/projects/ohnlp/files/cTAKES/), which has been released as an open source application under an Apache license. cTAKES is an information extraction system comprising of a number of NLP components. cTAKES has been applied to a number of biomedical use cases to mine the data within the clinical narrative such as i2b2, PGRN and eMERGE to name a few. Within the Integrating Informatics and Biology to the Bedside (i2b2), cTAKES has been used to extract patient characteristics for determining their status related to a specific phenotype (Multiple Scleroris, Inflamatory Bowel Disease, Type 2 Diabetes). Within the Pharmacogenomics Research Network (PGRN), cTAKES has been applied to automatically determine patient's disease activity and detect responders versus non-responders to a specific treatment. Within the Electronic Medical Record and Genomics (eMERGE), cTAKES has been applied to automatically discover patients with Peripheral Arterial Disease.

Among some of Dr. Savova's NLP collaborators are Profs. Martha Palmer, James Martin and Wayne Ward from University of Colorado, Prof. Wendy Chapman from University of California at San Diego, Prof. Noemie Elhadad from Columbia University, Drs. Lynette Hirschman, Cheryl Clark and John Aberdeen from the MITRE Corporation, Prof. James Pustejovsky from Brandeis University, Prof. Rebecca Crowley from University of Pittsburgh. Dr. Savova is the recipient of NIH funding for multiple projects which are listed separately on this website.

Research Background

Dr. Guergana Savova is a reviewer for the Journal of the Medical Informatics Association (JAMIA), Journal of the Biomedical Informatics (JBI) and many conferences/workshops. She is also a member of the National Library of Medicine's Biomedical Library and Informatics Review Committee.

Dr. Guergana Savova holds a PhD in Linguistics with a minor in Cognitive Science and a Masters of Science in Computer Science from University of Minnesota. Before joining CHIP and HMS in 2010, Dr. Savova was member of the the Biomedical Statistics and Informatics Department faculty at the Mayo Clinic (2002-2010).

Publications

Evaluating large language model performance to support the diagnosis and management of patients with primary immune disorders. J Allergy Clin Immunol. 2025 Feb 14. View Abstract
Extracting Knowledge from Scientific Texts on Patient-Derived Cancer Models Using Large Language Models: Algorithm Development and Validation. bioRxiv. 2025 Jan 29. View Abstract
The TRIPOD-LLM reporting guideline for studies using large language models. Nat Med. 2025 Jan; 31(1):60-69. View Abstract
A New Era of Data-Driven Cancer Research and Care: Opportunities and Challenges. Cancer Discov. 2024 Oct 04; 14(10):1774-1778. View Abstract
The TRIPOD-LLM Statement: A Targeted Guideline For Reporting Large Language Models Use. medRxiv. 2024 Jul 25. View Abstract
Family history as the strongest predictor of aortic and peripheral aneurysms in patients with intracranial aneurysms. J Clin Neurosci. 2024 Aug; 126:128-134. View Abstract
The effect of using a large language model to respond to patient messages. Lancet Digit Health. 2024 Jun; 6(6):e379-e381. View Abstract
Evaluating the ChatGPT family of models for biomedical reasoning and classification. J Am Med Inform Assoc. 2024 04 03; 31(4):940-948. View Abstract
Considerations for Prompting Large Language Models-Reply. JAMA Oncol. 2024 Apr 01; 10(4):538-539. View Abstract
Large language models to identify social determinants of health in electronic health records. NPJ Digit Med. 2024 Jan 11; 7(1):6. View Abstract
Improving model transferability for clinical note section classification models using continued pretraining. J Am Med Inform Assoc. 2023 12 22; 31(1):89-97. View Abstract
DeepPhe-CR: Natural Language Processing Software Services for Cancer Registrar Case Abstraction. medRxiv. 2023 Oct 26. View Abstract
Use of Artificial Intelligence Chatbots for Cancer Treatment Information. JAMA Oncol. 2023 10 01; 9(10):1459-1462. View Abstract
DeepPhe-CR: Natural Language Processing Software Services for Cancer Registrar Case Abstraction. JCO Clin Cancer Inform. 2023 09; 7:e2300156. View Abstract
End-to-end clinical temporal information extraction with multi-head attention. Proc Conf Assoc Comput Linguist Meet. 2023 Jul; 2023:313-319. View Abstract
Natural Language Processing to Automatically Extract the Presence and Severity of Esophagitis in Notes of Patients Undergoing Radiotherapy. JCO Clin Cancer Inform. 2023 07; 7:e2300048. View Abstract
Natural Language Processing Methods to Empirically Explore Social Contexts and Needs in Cancer Patient Notes. JCO Clin Cancer Inform. 2023 05; 7:e2200196. View Abstract
Improving Model Transferability for Clinical Note Section Classification Models Using Continued Pretraining. medRxiv. 2023 Apr 24. View Abstract
An End-to-End Natural Language Processing System for Automatically Extracting Radiation Therapy Events From Clinical Texts. Int J Radiat Oncol Biol Phys. 2023 09 01; 117(1):262-273. View Abstract
Geometric Features Associated with Middle Cerebral Artery Bifurcation Aneurysm Formation: A Matched Case-Control Study. J Stroke Cerebrovasc Dis. 2022 Mar; 31(3):106268. View Abstract
Open-source Software Sustainability Models: Initial White Paper From the Informatics Technology for Cancer Research Sustainability and Industry Partnership Working Group. J Med Internet Res. 2021 12 02; 23(12):e20028. View Abstract
Tobacco use and age are associated with different morphologic features of anterior communicating artery aneurysms. Sci Rep. 2021 02 26; 11(1):4791. View Abstract
Clinical Natural Language Processing for Radiation Oncology: A Review and Practical Primer. Int J Radiat Oncol Biol Phys. 2021 Jul 01; 110(3):641-655. View Abstract
Morphological variables associated with ruptured basilar tip aneurysms. Sci Rep. 2021 01 28; 11(1):2526. View Abstract
Geometric variations associated with posterior communicating artery aneurysms. J Neurointerv Surg. 2021 Nov; 13(11):1049-1052. View Abstract
Vascular Geometry Associated with Anterior Communicating Artery Aneurysm Formation. World Neurosurg. 2021 02; 146:e1318-e1325. View Abstract
Surrounding vascular geometry associated with basilar tip aneurysm formation. Sci Rep. 2020 10 21; 10(1):17928. View Abstract
Adverse drug event presentation and tracking (ADEPT): semiautomated, high throughput pharmacovigilance using real-world data. JAMIA Open. 2020 Oct; 3(3):413-421. View Abstract
Age and morphology of posterior communicating artery aneurysms. Sci Rep. 2020 07 14; 10(1):11545. View Abstract
Mining Misdiagnosis Patterns from Biomedical Literature. AMIA Jt Summits Transl Sci Proc. 2020; 2020:360-366. View Abstract
Interactive Exploration of Longitudinal Cancer Patient Histories Extracted From Clinical Text. JCO Clin Cancer Inform. 2020 05; 4:412-420. View Abstract
Does BERT need domain adaptation for clinical negation detection? J Am Med Inform Assoc. 2020 04 01; 27(4):584-591. View Abstract
Adverse drug event rates in pediatric pulmonary hypertension: a comparison of real-world data sources. J Am Med Inform Assoc. 2020 02 01; 27(2):294-300. View Abstract
High-throughput phenotyping with electronic medical record data using a common semi-supervised approach (PheCAP). Nat Protoc. 2019 12; 14(12):3426-3444. View Abstract
Use of Narrative Concepts in Electronic Health Records to Validate Associations Between Genetic Factors and Response to Treatment of Inflammatory Bowel Diseases. Clin Gastroenterol Hepatol. 2020 07; 18(8):1890-1892. View Abstract
Use of Natural Language Processing to Extract Clinical Cancer Phenotypes from Electronic Medical Records. Cancer Res. 2019 11 01; 79(21):5463-5470. View Abstract
Morphological Variables Associated With Ruptured Middle Cerebral Artery Aneurysms. Neurosurgery. 2019 07 01; 85(1):75-83. View Abstract
Supervised methods to extract clinical events from cardiology reports in Italian. J Biomed Inform. 2019 07; 95:103219. View Abstract
Decreased Total Iron Binding Capacity May Correlate with Ruptured Intracranial Aneurysms. Sci Rep. 2019 04 15; 9(1):6054. View Abstract
Potential Impact of Initial Clinical Data on Adjustment of Pediatric Readmission Rates. Acad Pediatr. 2019 07; 19(5):589-598. View Abstract
Elevated International Normalized Ratio Is Associated With Ruptured Aneurysms. Stroke. 2018 09; 49(9):2046-2052. View Abstract
Association between aspirin dose and subarachnoid hemorrhage from saccular aneurysms: A case-control study. Neurology. 2018 09 18; 91(12):e1175-e1181. View Abstract
Low Serum Calcium and Magnesium Levels and Rupture of Intracranial Aneurysms. Stroke. 2018 07; 49(7):1747-1750. View Abstract
Lipid-Lowering Agents and High HDL (High-Density Lipoprotein) Are Inversely Associated With Intracranial Aneurysm Rupture. Stroke. 2018 05; 49(5):1148-1154. View Abstract
Clinical Natural Language Processing in languages other than English: opportunities and challenges. J Biomed Semantics. 2018 03 30; 9(1):12. View Abstract
Antihyperglycemic Agents Are Inversely Associated With Intracranial Aneurysm Rupture. Stroke. 2018 01; 49(1):34-39. View Abstract
Heroin Use Is Associated with Ruptured Saccular Aneurysms. Transl Stroke Res. 2018 08; 9(4):340-346. View Abstract
DeepPhe: A Natural Language Processing System for Extracting Cancer Phenotypes from Clinical Records. Cancer Res. 2017 11 01; 77(21):e115-e118. View Abstract
Capturing the Patient's Perspective: a Review of Advances in Natural Language Processing of Health-Related Text. Yearb Med Inform. 2017 Aug; 26(1):214-227. View Abstract
Phelan-McDermid syndrome data network: Integrating patient reported outcomes with clinical notes and curated genetic reports. Am J Med Genet B Neuropsychiatr Genet. 2018 10; 177(7):613-624. View Abstract
Association of intracranial aneurysm rupture with smoking duration, intensity, and cessation. Neurology. 2017 Sep 26; 89(13):1408-1415. View Abstract
Alcohol Consumption and Aneurysmal Subarachnoid Hemorrhage. Transl Stroke Res. 2018 02; 9(1):13-19. View Abstract
Towards generalizable entity-centric clinical coreference resolution. J Biomed Inform. 2017 05; 69:251-258. View Abstract
Large-scale identification of patients with cerebral aneurysms using natural language processing. Neurology. 2017 Jan 10; 88(2):164-168. View Abstract
An information model for computable cancer phenotypes. BMC Med Inform Decis Mak. 2016 09 15; 16(1):121. View Abstract
Suboptimal Clinical Documentation in Young Children with Severe Obesity at Tertiary Care Centers. Int J Pediatr. 2016; 2016:4068582. View Abstract
Electronic Health Record Based Algorithm to Identify Patients with Autism Spectrum Disorder. PLoS One. 2016; 11(7):e0159621. View Abstract
Developing an Algorithm to Detect Early Childhood Obesity in Two Tertiary Pediatric Medical Centers. Appl Clin Inform. 2016 07 20; 7(3):693-706. View Abstract
Normalizing acronyms and abbreviations to aid patient understanding of clinical texts: ShARe/CLEF eHealth Challenge 2013, Task 2. J Biomed Semantics. 2016 Jul 01; 7:43. View Abstract
Comparative Effectiveness of Infliximab and Adalimumab in Crohn's Disease and Ulcerative Colitis. Inflamm Bowel Dis. 2016 Apr; 22(4):880-5. View Abstract
PheKB: a catalog and workflow for creating electronic phenotype algorithms for transportability. J Am Med Inform Assoc. 2016 11; 23(6):1046-1052. View Abstract
Identification of Nonresponse to Treatment Using Narrative Data in an Electronic Health Record Inflammatory Bowel Disease Cohort. Inflamm Bowel Dis. 2016 Jan; 22(1):151-8. View Abstract
Semi-supervised Learning for Phenotyping Tasks. AMIA Annu Symp Proc. 2015; 2015:502-11. View Abstract
Multilayered temporal modeling for the clinical domain. J Am Med Inform Assoc. 2016 Mar; 23(2):387-95. View Abstract
Identification of subjects with polycystic ovary syndrome using electronic health records. Reprod Biol Endocrinol. 2015 Oct 29; 13:116. View Abstract
Methods to Develop an Electronic Medical Record Phenotype Algorithm to Compare the Risk of Coronary Artery Disease across 3 Chronic Disease Cohorts. PLoS One. 2015; 10(8):e0136651. View Abstract
An Introduction to Natural Language Processing: How You Can Get More From Those Electronic Notes You Are Generating. Pediatr Emerg Care. 2015 Jul; 31(7):536-41. View Abstract
Development of phenotype algorithms using electronic medical records and incorporating natural language processing. BMJ. 2015 Apr 24; 350:h1885. View Abstract
Developing a section labeler for clinical documents. AMIA Annu Symp Proc. 2014; 2014:636-44. View Abstract
Automatic identification of methotrexate-induced liver toxicity in patients with rheumatoid arthritis from the electronic medical record. J Am Med Inform Assoc. 2015 Apr; 22(e1):e151-61. View Abstract
Evaluating the state of the art in disorder recognition and normalization of the clinical narrative. J Am Med Inform Assoc. 2015 Jan; 22(1):143-54. View Abstract
Temporal Annotation in the Clinical Domain. Trans Assoc Comput Linguist. 2014 Apr; 2:143-154. View Abstract
Carrell et al. respond to "Observational research and the EHR". Am J Epidemiol. 2014 Mar 15; 179(6):762-3. View Abstract
Using natural language processing to improve efficiency of manual chart abstraction in research: the case of breast cancer recurrence. Am J Epidemiol. 2014 Mar 15; 179(6):749-58. View Abstract
Modeling disease severity in multiple sclerosis using electronic health records. PLoS One. 2013; 8(11):e78927. View Abstract
Normalization and standardization of electronic health records for high-throughput phenotyping: the SHARPn consortium. J Am Med Inform Assoc. 2013 Dec; 20(e2):e341-8. View Abstract
Discovering body site and severity modifiers in clinical texts. J Am Med Inform Assoc. 2014 May-Jun; 21(3):448-54. View Abstract
Improved de-identification of physician notes through integrative modeling of both public and private medical text. BMC Med Inform Decis Mak. 2013 Oct 02; 13:112. View Abstract
Automatic prediction of rheumatoid arthritis disease activity from the electronic medical records. PLoS One. 2013; 8(8):e69932. View Abstract
Normalization of plasma 25-hydroxy vitamin D is associated with reduced risk of surgery in Crohn's disease. Inflamm Bowel Dis. 2013 Aug; 19(9):1921-7. View Abstract
Improving case definition of Crohn's disease and ulcerative colitis in electronic medical records using natural language processing: a novel informatics approach. Inflamm Bowel Dis. 2013 Jun; 19(7):1411-20. View Abstract
Formative evaluation of ontology learning methods for entity discovery by using existing ontologies as reference standards. Methods Inf Med. 2013; 52(4):308-16. View Abstract
Towards comprehensive syntactic and semantic annotations of the clinical narrative. J Am Med Inform Assoc. 2013 Sep-Oct; 20(5):922-30. View Abstract
Similar risk of depression and anxiety following surgery or hospitalization for Crohn's disease and ulcerative colitis. Am J Gastroenterol. 2013 Apr; 108(4):594-601. View Abstract
Psychiatric co-morbidity is associated with increased risk of surgery in Crohn's disease. Aliment Pharmacol Ther. 2013 Feb; 37(4):445-54. View Abstract
A common type system for clinical natural language processing. J Biomed Semantics. 2013 Jan 03; 4(1):1. View Abstract
Large-scale evaluation of automated clinical note de-identification and its impact on information extraction. J Am Med Inform Assoc. 2013 Jan 01; 20(1):84-94. View Abstract
Anaphoric reference in clinical reports: characteristics of an annotated corpus. J Biomed Inform. 2012 Jun; 45(3):507-21. View Abstract
Building a robust, scalable and standards-driven infrastructure for secondary use of EHR data: the SHARPn project. J Biomed Inform. 2012 Aug; 45(4):763-71. View Abstract
A system for coreference resolution for the clinical narrative. J Am Med Inform Assoc. 2012 Jul-Aug; 19(4):660-7. View Abstract
Automated discovery of drug treatment patterns for endocrine therapy of breast cancer within an electronic medical record. J Am Med Inform Assoc. 2012 Jun; 19(e1):e83-9. View Abstract
The MiPACQ clinical question answering system. AMIA Annu Symp Proc. 2011; 2011:171-80. View Abstract
The SHARPn project on secondary use of Electronic Medical Record data: progress, plans, and possibilities. AMIA Annu Symp Proc. 2011; 2011:248-56. View Abstract
Drug side effect extraction from clinical narratives of psychiatry and psychology patients. J Am Med Inform Assoc. 2011 Dec; 18 Suppl 1:i144-9. View Abstract
Overcoming barriers to NLP for clinical text: the role of shared tasks and the need for additional creative solutions. J Am Med Inform Assoc. 2011 Sep-Oct; 18(5):540-3. View Abstract
Coreference resolution: a review of general methodologies and applications in the clinical domain. J Biomed Inform. 2011 Dec; 44(6):1113-22. View Abstract
Anaphoric relations in the clinical narrative: corpus creation. J Am Med Inform Assoc. 2011 Jul-Aug; 18(4):459-65. View Abstract
The emerging role of electronic medical records in pharmacogenomics. Clin Pharmacol Ther. 2011 Mar; 89(3):379-86. View Abstract
Discovering peripheral arterial disease cases from radiology notes using natural language processing. AMIA Annu Symp Proc. 2010 Nov 13; 2010:722-6. View Abstract
Classification of medication status change in clinical narratives. AMIA Annu Symp Proc. 2010 Nov 13; 2010:762-6. View Abstract
CNTRO: A Semantic Web Ontology for Temporal Relation Inferencing in Clinical Narratives. AMIA Annu Symp Proc. 2010 Nov 13; 2010:787-91. View Abstract
Effectiveness of lexico-syntactic pattern matching for ontology enrichment with clinical documents. Methods Inf Med. 2011; 50(5):397-407. View Abstract
Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications. J Am Med Inform Assoc. 2010 Sep-Oct; 17(5):507-13. View Abstract
Leveraging informatics for genetic studies: use of the electronic medical record to enable a genome-wide association study of peripheral arterial disease. J Am Med Inform Assoc. 2010 Sep-Oct; 17(5):568-74. View Abstract
The Rochester Epidemiology Project: exploiting the capabilities for population-based research in rheumatic diseases. Rheumatology (Oxford). 2011 Jan; 50(1):6-15. View Abstract
Towards temporal relation discovery from the clinical narrative. AMIA Annu Symp Proc. 2009 Nov 14; 2009:568-72. View Abstract
Mayo clinic smoking status classification system: extensions and improvements. AMIA Annu Symp Proc. 2009 Nov 14; 2009:619-23. View Abstract
Discerning tumor status from unstructured MRI reports--completeness of information in existing reports and utility of automated natural language processing. J Digit Imaging. 2010 Apr; 23(2):119-32. View Abstract
Automatically extracting cancer disease characteristics from pathology reports into a Disease Knowledge Representation Model. J Biomed Inform. 2009 Oct; 42(5):937-49. View Abstract
The first step toward data reuse: disambiguating concept representation of the locally developed ICU nursing flowsheets. Comput Inform Nurs. 2008 Sep-Oct; 26(5):282-9. View Abstract
Word sense disambiguation across two domains: biomedical literature and clinical notes. J Biomed Inform. 2008 Dec; 41(6):1088-100. View Abstract
Extracting information from textual documents in the electronic health record: a review of recent research. Yearb Med Inform. 2008; 128-44. View Abstract
Mayo clinic NLP system for patient smoking status identification. J Am Med Inform Assoc. 2008 Jan-Feb; 15(1):25-8. View Abstract
Formalizing the International Classification of Functioning, Disability, and Health (ICF) using Formal Concept Analysis (FCA). AMIA Annu Symp Proc. 2007 Oct 11; 994. View Abstract
Toward near real-time acuity estimation: a feasibility study. Nurs Res. 2007 Jul-Aug; 56(4):288-94. View Abstract
Content coverage of SNOMED-CT toward the ICU nursing flowsheets and the acuity indicators. Stud Health Technol Inform. 2006; 122:722-6. View Abstract
Building and evaluating annotated corpora for medical NLP systems. AMIA Annu Symp Proc. 2006; 1050. View Abstract
Frame semantics and the domain of functioning, disability and health. AMIA Annu Symp Proc. 2005; 1106. View Abstract
A term extraction tool for expanding content in the domain of functioning, disability, and health: proof of concept. J Biomed Inform. 2003 Aug-Oct; 36(4-5):250-9. View Abstract
Testing the generalizability of the ISO model for nursing diagnoses. AMIA Annu Symp Proc. 2003; 274-8. View Abstract
A data-driven approach for extracting "the most specific term" for ontology development. AMIA Annu Symp Proc. 2003; 579-83. View Abstract

Contact Guergana Savova

Phone: 617-919-2972

Email:

Print Profile