Publications

In Preparation

Song, Tzu-Hsi, Leonardo Clemente, Xiang Pan, Junbong Jang, Mauricio Santillana, and Kwonmoo Lee. In Preparation. “Fine-Grained Forecasting of COVID-19 Trends at the County Level in the United States”. MedRxiv, In Preparation. https://doi.org/10.1101/2024.01.13.24301248.

The coronavirus (COVID-19) pandemic has profoundly impacted various aspects of daily life, society, healthcare systems, and global health policies. This pandemic has resulted in more than one hundred million people being infected and, unfortunately, the loss of life for many individuals. Although treatment for the coronavirus is now available, effective forecasting of COVID-19 infection is the most importance to aid public health officials in making critical decisions. However, forecasting COVID-19 trends through time-series analysis poses significant challenges due to the data’s inherently dynamic, transient, and noise-prone nature. In this study, we have developed the Fine-Grained Infection Forecast Network (FIGI-Net) model, which provides accurate forecasts of COVID-19 trends up to two weeks in advance. FIGI-Net addresses the current limitations in COVID-19 forecasting by leveraging fine-grained county-level data and a stacked bidirectional LSTM structure. We employ a pre-trained model to capture essential global infection patterns. Subsequently, these pre-trained parameters were transferred to train localized sub-models for county clusters exhibiting comparable infection dynamics. This model adeptly handles sudden changes and rapid fluctuations in data, frequently observed across various times and locations of county-level data, ultimately improving the accuracy of COVID-19 infection forecasting at the county, state, and national levels. FIGI-Net model demonstrated significant improvement over other deep learning-based models and state-of-the-art COVID-19 forecasting models, evident in various standard evaluation metrics. Notably, FIGI-Net model excels at forecasting the direction of infection trends, especially during the initial phases of different COVID-19 outbreak waves. Our study underscores the effectiveness and superiority of our time-series deep learning-based methods in addressing dynamic and sudden changes in infection numbers over short-term time periods. These capabilities facilitate efficient public health management and the early implementation of COVID-19 transmission prevention measures.Competing Interest StatementThe authors have declared no competing interest.Funding StatementThis work was supported by NIH, United States (Grant Number: R35GM133725).Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesI confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).Yes I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.YesCan you update the data availability statement and send me the pdf file? The data used in this study are publicly available and consist of daily COVID-19 cumulative infectious and death cases reported for U.S. counties. The dataset was obtained from the Johns Hopkins Center for Systems Science and Engineering (CSSE) Coronavirus Resource Center, spanning from January 21st, 2020, to April 16th, 2022 [14]. The dataset can be directly accessed from the Johns Hopkins CSSE Coronavirus Resource Center website (https://github.com/CSSEGISandData/COVID-19). Researchers interested in utilizing the data for further analysis can refer to the original source for detailed documentation on data collection methods and definitions. For additional information or inquiries about the dataset, please visit the website or contact the Johns Hopkins CSSE Coronavirus Resource Center.

Song, Tzu-Hsi, Mengzhi Cao, Jouha Min, Hyungsoon Im, Hakho Lee, and Kwonmoo Lee. In Preparation. “Interpretable Deep Learning for Breast Cancer Cell Phenotyping Using Diffraction Images from Lens-Free Digital In-Line Holography”. BioRxiv, In Preparation. https://doi.org/10.1101/2021.05.29.446284.

Lens-free digital in-line holography (LDIH) offers a wide field of view at micrometer-scale resolution, surpassing the capabilities of lens-based microscopes, making it a promising diagnostic tool for high-throughput cellular analysis. However, the complex nature of holograms renders them challenging for human interpretation, necessitating time- consuming computational processing to reconstruct object images. To address this, we present HoloNet, a novel deep learning architecture specifically designed for direct analysis of holographic images from LDIH in cellular phenotyping. HoloNet extracts both global features from diffraction patterns and local features from convolutional layers, achieving superior performance and interpretability compared to other deep learning methods. By leveraging raw holograms of breast cancer cells stained with well-known markers ER/PR and HER2, HoloNet demonstrates its effectiveness in classifying breast cancer cell types and quantifying molecular marker intensities. Furthermore, we introduce the feature-fusion HoloNet model, which extracts diffraction features associated with breast cancer cell types and their marker intensities. This hologram embedding approach allows for the identification of previously unknown subtypes of breast cancer cells, facilitating a comprehensive analysis of cell phenotype heterogeneity, leading to precise breast cancer diagnosis.Competing Interest StatementThe authors have declared no competing interest.

Basher, Abdur Rahman M. A., Caleb Hallinan, and Kwonmoo Lee. In Preparation. “Heterogeneity-Preserving Discriminative Feature Selection for Subtype Discovery”. BioRxiv (In Revision), In Preparation. https://doi.org/10.1101/2023.05.14.540686.

The discovery of subtypes is pivotal for disease diagnosis and targeted therapy, considering the diverse responses of different cells or patients to specific treatments. Exploring the heterogeneity within disease or cell states provides insights into disease progression mechanisms and cell differentiation. The advent of high-throughput technologies has enabled the generation and analysis of various molecular data types, such as single-cell RNA-seq, proteomic, and imaging datasets, at large scales. While presenting opportunities for subtype discovery, these datasets pose challenges in finding relevant signatures due to their high dimensionality. Feature selection, a crucial step in the analysis pipeline, involves choosing signatures that reduce the feature size for more efficient downstream computational analysis. Numerous existing methods focus on selecting signatures that differentiate known diseases or cell states, yet they often fall short in identifying features that preserve heterogeneity and reveal subtypes. To identify features that can capture the diversity within each class while also maintaining the discrimination of known disease states, we employed deep metric learning-based feature embedding to conduct a detailed exploration of the statistical properties of features essential in preserving heterogeneity. Our analysis revealed that features with a significant difference in interquartile range (IQR) between classes possess crucial subtype information. Guided by this insight, we developed a robust statistical method, termed PHet (Preserving Heterogeneity) that performs iterative subsampling differential analysis of IQR and Fisher’s method between classes, identifying a minimal set of heterogeneity-preserving discriminative features to optimize subtype clustering quality. Validation using public single-cell RNA-seq and microarray datasets showcased PHet’s effectiveness in preserving sample heterogeneity while maintaining discrimination of known disease/cell states, surpassing the performance of previous outlier-based methods. Furthermore, analysis of a single-cell RNA-seq dataset from mouse tracheal epithelial cells revealed, through PHet-based features, the presence of two distinct basal cell subtypes undergoing differentiation toward a luminal secretory phenotype. Notably, one of these subtypes exhibited high expression of BPIFA1. Interestingly, previous studies have linked BPIFA1 secretion to the emergence of secretory cells during mucociliary differentiation of airway epithelial cells. PHet successfully pinpointed the basal cell subtype associated with this phenomenon, a distinction that pre-annotated markers and dispersion-based features failed to make due to their admixed feature expression profiles. These findings underscore the potential of our method to deepen our understanding of the mechanisms underlying diseases and cell differentiation and contribute significantly to personalized medicine.Competing Interest StatementThe authors have declared no competing interest.

Wang, Chuangqi, Hee June Choi, Lucy Woodbury, and Kwonmoo Lee. In Preparation. “Uncovering Interpretable Fine-Grained Phenotypes of Subcellular Dynamics through Unsupervised Self-Training of Deep Neural Networks”. BioRxiv, In Preparation. https://doi.org/10.1101/2021.05.25.445699.

Live cell imaging provides unparallel insights into dynamic cellular processes across spatiotemporal scales. Despite its potential, the inherent spatiotemporal heterogeneity within live cell imaging data often obscures critical mechanical details underlying cellular dynamics. Uncovering fine-grained phenotypes of live cell dynamics is pivotal for precise understandings of the heterogeneity of physiological and pathological processes. However, this endeavor introduces formidable technical challenges to unsupervised machine learning, demanding the extraction of features that can faithfully preserve heterogeneity, effectively discriminate between different molecularly perturbed states, and provide interpretability. While deep learning shows promise in extracting useful features from large datasets, it often falls short in producing such high-fidelity features, especially in unsupervised learning settings. To tackle these challenges, we present DeepHACX (Deep phenotyping of Heterogeneous Activities of Cellular dynamics with eXplanations), a self-training deep learning framework designed for fine-grained and interpretable phenotyping. This framework seamlessly integrates an unsupervised teacher model with interpretable features to facilitate feature learning in a student deep neural network (DNN). Significantly, it incorporates an autoencoder-based regularizer, termed SENSER (SENSitivity-enhancing autoEncoding Regularizer), designed to prompt the student DNN to maximize the heterogeneity associated with molecular perturbations. This approach enables the acquisition of features that not only discriminate between different molecularly perturbed states but also faithfully preserve the heterogeneity linked to these perturbations. In our study, DeepHACX successfully delineated fine-grained phenotypes within the heterogeneous protrusion dynamics of migrating epithelial cells, uncovering specific responses to pharmacological perturbations. Remarkably, DeepHACX adeptly captured a minimal number of highly interpretable features uniquely linked to these fine-grained phenotypes, each corresponding to specific temporal intervals crucial for their manifestation. This unique capability positions DeepHACX as a valuable tool for investigating diverse cellular dynamics and comprehensively studying their heterogeneity.Competing Interest StatementThe authors have declared no competing interest.

2023

Biber, John C, Andra Sullivan, Joseph A Brazzo, Yuna Heo, Bat-Ider Tumenbayar, Amanda Krajnik, Kerry E Poppenberg, et al. (2023) 2023. “Survivin As a Mediator of Stiffness-Induced Cell Cycle Progression and Proliferation of Vascular Smooth Muscle Cells”. APL Bioengineering 7 (4): 046108. https://doi.org/10.1063/5.0150532.

Stiffened arteries are a pathology of atherosclerosis, hypertension, and coronary artery disease and a key risk factor for cardiovascular disease events. The increased stiffness of arteries triggers a phenotypic switch, hypermigration, and hyperproliferation of vascular smooth muscle cells (VSMCs), leading to neointimal hyperplasia and accelerated neointima formation. However, the mechanism underlying this trigger remains unknown. Our analyses of whole-transcriptome microarray data from mouse VSMCs cultured on stiff hydrogels simulating arterial pathology identified 623 genes that were significantly and differentially expressed (360 upregulated and 263 downregulated) relative to expression in VSMCs cultured on soft hydrogels. Functional enrichment and gene network analyses revealed that these stiffness-sensitive genes are linked to cell cycle progression and proliferation. Importantly, we found that survivin, an inhibitor of apoptosis protein, mediates stiffness-dependent cell cycle progression and proliferation as determined by gene network and pathway analyses, RT-qPCR, immunoblotting, and cell proliferation assays. Furthermore, we found that inhibition of cell cycle progression did not reduce survivin expression, suggesting that survivin functions as an upstream regulator of cell cycle progression and proliferation in response to ECM stiffness. Mechanistically, we found that the stiffness signal is mechanotransduced via the FAK-E2F1 signaling axis to regulate survivin expression, establishing a regulatory pathway for how the stiffness of the cellular microenvironment affects VSMC behaviors. Overall, our findings indicate that survivin is necessary for VSMC cycling and proliferation and plays a role in regulating stiffness-responsive phenotypes.

Krajnik, Amanda, Erik Nimmer, Joseph A Brazzo, John C Biber, Rhonda Drewes, Bat-Ider Tumenbayar, Andra Sullivan, et al. (2023) 2023. “Survivin Regulates Intracellular Stiffness and Extracellular Matrix Production in Vascular Smooth Muscle Cells”. APL Bioengineering 7 (4): 046104. https://doi.org/10.1063/5.0157549.

Vascular dysfunction is a common cause of cardiovascular diseases characterized by the narrowing and stiffening of arteries, such as atherosclerosis, restenosis, and hypertension. Arterial narrowing results from the aberrant proliferation of vascular smooth muscle cells (VSMCs) and their increased synthesis and deposition of extracellular matrix (ECM) proteins. These, in turn, are modulated by arterial stiffness, but the mechanism for this is not fully understood. We found that survivin is an important regulator of stiffness-mediated ECM synthesis and intracellular stiffness in VSMCs. Whole-transcriptome analysis and cell culture experiments showed that survivin expression is upregulated in injured femoral arteries in mice and in human VSMCs cultured on stiff fibronectin-coated hydrogels. Suppressed expression of survivin in human VSMCs significantly decreased the stiffness-mediated expression of ECM components related to arterial stiffening, such as collagen-I, fibronectin, and lysyl oxidase. By contrast, expression of these ECM proteins was rescued by ectopic expression of survivin in human VSMCs cultured on soft hydrogels. Interestingly, atomic force microscopy analysis showed that suppressed or ectopic expression of survivin decreases or increases intracellular stiffness, respectively. Furthermore, we observed that inhibiting Rac and Rho reduces survivin expression, elucidating a mechanical pathway connecting intracellular tension, mediated by Rac and Rho, to survivin induction. Finally, we found that survivin inhibition decreases FAK phosphorylation, indicating that survivin-dependent intracellular tension feeds back to maintain signaling through FAK. These findings suggest a novel mechanism by which survivin potentially modulates arterial stiffness.

Jang, J., Y. Kim*, B. Westgate, Y. Zong, C. Hallinan, A. Akalin*, and K. Lee*. (2023) 2023. “Screening Adequacy of Unstained Fine Needle Aspiration Samples Using a Deep Learning-based Classifier”. Scientific Reports 13: 13525 (*Co-corresponding authors).

Fine needle aspiration (FNA) biopsy of thyroid nodules is a safe, cost-effective, and accurate diagnostic method for detecting thyroid cancer. However, about 10% of initial FNA biopsy samples from patients are non-diagnostic and require repeated FNA, which delays the diagnosis and appropriate care. On-site evaluation of the FNA sample can be performed to filter out non-diagnostic FNA samples. Unfortunately, it involves a time-consuming staining process, and a cytopathologist has to be present at the time of FNA. To bypass the staining process and expert interpretation of FNA specimens at the clinics, we developed a deep learning-based ensemble model termed FNA-Net that allows in situ screening of adequacy of unstained thyroid FNA samples smeared on a glass slide which can decrease the non-diagnostic rate in thyroid FNA. FNA-Net combines two deep learning models, a patch-based whole slide image classifier and Faster R-CNN, to detect follicular clusters with high precision. Then, FNA-Net classifies sample slides to be non-diagnostic if the total number of detected follicular clusters is less than a predetermined threshold. With bootstrapped sampling, FNA-Net achieved a 0.81 F1 score and 0.84 AUC in the precision-recall curve for detecting the non-diagnostic slides whose follicular clusters are less than six. We expect that FNA-Net can dramatically reduce the diagnostic cost associated with FNA biopsy and improve the quality of patient care.

*Co-corresponding authors

Pan*, X., C. Wang*, Y. Yu, N. Reljin, D. McManus, C. Darling, K. Chon**, Y. Mendelson**, and K. Lee**. 2023. “Deep cross-modal feature learning applied to predict acutely decompensated heart failure using in-home collected electrocardiography and transthoracic bioimpedance”. Artificial Intelligence in Medicine 140: 102548 (*Co-first authors, **Co-corresponding authors).

Background
Deep learning has been successfully applied to ECG data to aid in the accurate and more rapid diagnosis of acutely decompensated heart failure (ADHF). Previous applications focused primarily on classifying known ECG patterns in well-controlled clinical settings. However, this approach does not fully capitalize on the potential of deep learning, which directly learns important features without relying on a priori knowledge. In addition, deep learning applications to ECG data obtained from wearable devices have not been well studied, especially in the field of ADHF prediction.

Methods
We used ECG and transthoracic bioimpedance data from the SENTINEL-HF study, which enrolled patients (≥21 years) who were hospitalized with a primary diagnosis of heart failure or with ADHF symptoms. To build an ECG-based prediction model of ADHF, we developed a deep cross-modal feature learning pipeline, termed ECGX-Net, that utilizes raw ECG time series and transthoracic bioimpedance data from wearable devices. To extract rich features from ECG time series data, we first adopted a transfer learning approach in which ECG time series were transformed into 2D images, followed by feature extraction using ImageNet-pretrained DenseNet121/VGG19 models. After data filtering, we applied cross-modal feature learning in which a regressor was trained with ECG and transthoracic bioimpedance. Then, we concatenated the DenseNet121/VGG19 features with the regression features and used them to train a support vector machine (SVM) without bioimpedance information.

Results
The high-precision classifier using ECGX-Net predicted ADHF with a precision of 94 %, a recall of 79 %, and an F1-score of 0.85. The high-recall classifier with only DenseNet121 had a precision of 80 %, a recall of 98 %, and an F1-score of 0.88. We found that ECGX-Net was effective for high-precision classification, while DenseNet121 was effective for high-recall classification.

Conclusion
We show the potential for predicting ADHF from single-channel ECG recordings obtained from outpatients, enabling timely warning signs of heart failure. Our cross-modal feature learning pipeline is expected to improve ECG-based heart failure prediction by handling the unique requirements of medical scenarios and resource limitations.

*Co-first authors: X. Pan and C. Wang. **Co-corresponding authors: K. Chon, Y. Mendelson, and K. Lee

Jang, J., K. Lee*, and T. K. Kim*. 2023. “Unsupervised Contour Tracking of Live Cells by Mechanical and Cycle Consistency Losses”. IEEE, CVF Conference on Computer Vision and Pattern Recognition (CVPR) (*Co-Corresponding Authors).

Analyzing the dynamic changes of cellular morphology is important for understanding the various functions and characteristics of live cells, including stem cells and metastatic cancer cells. To this end, we need to track all points on the highly deformable cellular contour in every frame of live cell video. Local shapes and textures on the contour are not evident, and their motions are complex, often with expansion and contraction of local contour features. The prior arts for optical flow or deep point set tracking are unsuited due to the fluidity of cells, and previous deep contour tracking does not consider point correspondence. We propose the first deep learning-based tracking of cellular (or more generally viscoelastic materials) contours with point correspondence by fusing dense representation between two contours with cross attention. Since it is impractical to manually label dense tracking points on the contour, unsupervised learning comprised of the mechanical and cyclical consistency losses is proposed to train our contour tracker. The mechanical loss forcing the points to move perpendicular to the contour effectively helps out. For quantitative evaluation, we labeled sparse tracking points along the contour of live cells from two live cell datasets taken with phase contrast and confocal fluorescence microscopes. Our contour tracker quantitatively outperforms compared methods and produces qualitatively more favorable results. Our code and data are publicly available at this https URL

Project page: https://junbongjang.github.io/projects/contour-tracking/index.html

2022

Jang, J., C. Hallinan, and K. Lee. 2022. “Protocol for live cell image segmentation to profile cellular morphodynamics using MARS-Net”. STAR Protocols 3: 101469.

Quantitative studies of cellular morphodynamics rely on accurate cell segmentation in live cell images. However, fluorescence and phase contrast imaging hinder accurate edge localization. To address this challenge, we developed MARS-Net, a deep learning model integrating ImageNet-pretrained VGG19 encoder and U-Net decoder trained on the datasets from multiple types of microscopy images. Here, we provide the protocol for installing MARS-Net, labeling images, training MARS-Net for edge localization, evaluating the trained models’ performance, and performing the quantitative profiling of cellular morphodynamics.