Brown A, Kong SW, Kohane I, Patel C. ksRepo: a generalized platform for computational drug repositioning. BMC Bioinformatics. 2016;17(1):78. doi:10.1186/s12859-016-0931-y
BACKGROUND: Repositioning approved drug and small molecules in novel therapeutic areas is of key interest to the pharmaceutical industry. A number of promising computational techniques have been developed to aid in repositioning, however, the majority of available methodologies require highly specific data inputs that preclude the use of many datasets and databases. There is a clear unmet need for a generalized methodology that enables the integration of multiple types of both gene expression data and database schema. RESULTS: ksRepo eliminates the need for a single microarray platform as input and allows for the use of a variety of drug and chemical exposure databases. We tested ksRepo's performance on a set of five prostate cancer datasets using the Comparative Toxicogenomics Database (CTD) as our database of gene-compound interactions. ksRepo successfully predicted significance for five frontline prostate cancer therapies, representing a significant enrichment from over 7000 CTD compounds, and achieved specificity similar to other repositioning methods. CONCLUSIONS: We present ksRepo, which enables investigators to use any data inputs for computational drug repositioning. ksRepo is implemented in a series of four functions in the R statistical environment under a BSD3 license. Source code is freely available at . A vignette is provided to aid users in performing ksRepo analysis.
Castro, Kong, Clements, Brady, Kaimal, Doyle, Robinson, Churchill, Kohane, Perlis. Absence of evidence for increase in risk for autism or attention-deficit hyperactivity disorder following antidepressant exposure during pregnancy: a replication study. Transl Psychiatry. 2016;6:e708. doi:10.1038/tp.2015.190
Multiple studies have examined the risk of prenatal antidepressant exposure and risk for autism spectrum disorder (ASD) or attention-deficit hyperactivity disorder (ADHD), with inconsistent results. Precisely estimating such risk, if any, is of great importance in light of the need to balance such risk with the benefit of depression and anxiety treatment. We developed a method to integrate data from multiple New England health systems, matching offspring and maternal health data in electronic health records to characterize diagnoses and medication exposure. Children with ASD or ADHD were matched 1:3 with children without neurodevelopmental disorders. Association between maternal antidepressant exposure and ASD or ADHD liability was examined using logistic regression, adjusting for potential sociodemographic and psychiatric confounding variables. In new cohorts of 1245 ASD cases and 1701 ADHD cases, along with age-, sex- and socioeconomic status matched controls, neither disorder was significantly associated with prenatal antidepressant exposure in crude or adjusted models (adjusted odds ratio 0.90, 95% confidence interval 0.50-1.54 for ASD; 0.97, 95% confidence interval 0.53-1.69 for ADHD). Pre-pregnancy antidepressant exposure significantly increased risk for both disorders. These results suggest that prior reports of association between prenatal antidepressant exposure and neurodevelopmental disease are likely to represent a false-positive finding, which may arise in part through confounding by indication. They further demonstrate the potential to integrate data across electronic health records studies spanning multiple health systems to enable efficient pharmacovigilance investigation.


Prilutsky D, Kho A, Palmer N, Bhakar A, Smedemark-Margulies N, Kong SW, Margulies D, Bear M, Kohane I. Gene expression analysis in Fmr1KO mice identifies an immunological signature in brain tissue and mGluR5-related signaling in primary neuronal cultures. Mol Autism. 2015;6:66. doi:10.1186/s13229-015-0061-9
BACKGROUND: Fragile X syndrome (FXS) is a neurodevelopmental disorder whose biochemical manifestations involve dysregulation of mGluR5-dependent pathways, which are widely modeled using cultured neurons. In vitro phenotypes in cultured neurons using standard morphological, functional, and chemical approaches have demonstrated considerable variability. Here, we study transcriptomes obtained in situ in the intact brain tissues of a murine model of FXS to see how they reflect the in vitro state. METHODS: We used genome-wide mRNA expression profiling as a robust characterization tool for studying differentially expressed pathways in fragile X mental retardation 1 (Fmr1) knockout (KO) and wild-type (WT) murine primary neuronal cultures and in embryonic hippocampal and cortical murine tissue. To study the developmental trajectory and to relate mouse model data to human data, we used an expression map of human development to plot murine differentially expressed genes in KO/WT cultures and brain. RESULTS: We found that transcriptomes from cell cultures showed a stronger signature of Fmr1KO than whole tissue transcriptomes. We observed an over-representation of immunological signaling pathways in embryonic Fmr1KO cortical and hippocampal tissues and over-represented mGluR5-downstream signaling pathways in Fmr1KO cortical and hippocampal primary cultures. Genes whose expression was up-regulated in Fmr1KO murine cultures tended to peak early in human development, whereas differentially expressed genes in embryonic cortical and hippocampal tissues clustered with genes expressed later in human development. CONCLUSIONS: The transcriptional profile in brain tissues primarily centered on immunological mechanisms, whereas the profiles from cell cultures showed defects in neuronal activity. We speculate that the isolation and culturing of neurons caused a shift in neurological transcriptome towards a "juvenile" or "de-differentiated" state. Moreover, cultured neurons lack the close coupling with glia that might be responsible for the immunological phenotype in the intact brain. Our results suggest that cultured cells may recapitulate an early phase of the disease, which is also less obscured with a consequent "immunological" phenotype and in vivo compensatory mechanisms observed in the embryonic brain. Together, these results suggest that the transcriptome of cultured primary neuronal cells, in comparison to whole brain tissue, more robustly demonstrated the difference between Fmr1KO and WT mice and might reveal a molecular phenotype, which is typically hidden by compensatory mechanisms present in vivo. Moreover, cultures might be useful for investigating the perturbed pathways in early human brain development and genes previously implicated in autism.
Seok H-S, Song T, Kong SW, Hwang K-B. An Efficient Search Algorithm for Finding Genomic-Range Overlaps Based on the Maximum Range Length. IEEE/ACM Trans Comput Biol Bioinform. 2015;12(4):778–84. doi:10.1109/TCBB.2014.2369042
Efficient search algorithms for finding genomic-range overlaps are essential for various bioinformatics applications. A majority of fast algorithms for searching the overlaps between a query range (e.g., a genomic variant) and a set of N reference ranges (e.g., exons) has time complexity of O(k + logN), where kdenotes a term related to the length and location of the reference ranges. Here, we present a simple but efficient algorithm that reduces k, based on the maximum reference range length. Specifically, for a given query range and the maximum reference range length, the proposed method divides the reference range set into three subsets: always, potentially, and never overlapping. Therefore, search effort can be reduced by excluding never overlapping subset. We demonstrate that the running time of the proposed algorithm is proportional to potentially overlapping subset size, that is proportional to the maximum reference range length if all the other conditions are the same. Moreover, an implementation of our algorithm was 13.8 to 30.0 percent faster than one of the fastest range search methods available when tested on various genomic-range data sets. The proposed algorithm has been incorporated into a disease-linked variant prioritization pipeline for WGS ( and its implementation is available at
Park J-H, Cho B, Kwon H, Prilutsky D, Yun JM, Choi HC, Hwang K-B, Lee I-H, Kim J-I, Kong SW. I148M variant in PNPLA3 reduces central adiposity and metabolic disease risks while increasing nonalcoholic fatty liver disease. Liver Int. 2015. doi:10.1111/liv.12909
BACKGROUNDS & AIMS: The I148M variant due to the substitution of C to G in PNPLA3 (rs738409) is associated with the increased risk of nonalcoholic fatty liver disease (NAFLD). In liver, I148M variant reduces hydrolytic function of PNPLA3, which results in hepatic steatosis; however, its association with the other clinical phenotype such as adiposity and metabolic diseases is not well established. METHODS: To identify the impact of I148M variant on clinical risk factors of NAFLD, we recruited 1,363 generally healthy Korean males after excluding alcoholic and secondary causes of hepatic steatosis. Central adiposity was assessed by computed tomography, and hepatic steatosis was evaluated by abdominal ultrasonography. RESULTS: The participants were predominantly middle-aged (49.0 ±7.1 years; range 30-60 years), and the frequency of NAFLD was 44.2%. The rs738409-G allele carriers had a 1.19-fold increased risk for NAFLD (minor allele frequency 0.43; allelic odds ratio 1.38; P = 4.3×10(-5) ). Interestingly, the rs738409 GG carriers showed significantly lower levels of visceral and subcutaneous adiposity (P
Zhang, Zhou, Ogmundsdottir, Möller, Siddaway, Larue, Hsing, Kong, Goding, Palsson, et al. Mitf is a master regulator of the v-ATPase forming an Mitf/v-ATPase/TORC1 control module for cellular homeostasis. J Cell Sci. 2015. doi:10.1242/jcs.173807
The v-ATPase is a fundamental eukaryotic enzyme central to cellular homeostasis. Although its impact on key metabolic regulators such as TORC1 is well-documented, our knowledge of mechanisms that regulate v-ATPase activity is limited. Here, we report that the Drosophila transcription factor Mitf is a master regulator of this holoenzyme. Mitf directly controls transcription of all 15 v-ATPase components through M-box cis-sites and this coordinated regulation impacts holoenzyme activity in vivo. In addition, through the v-ATPase, Mitf promotes the activity of TORC1, which in turn negatively regulates Mitf. We provide evidence that Mitf, v-ATPase and TORC1 form a negative regulatory loop that maintains each of these important metabolic regulators in relative balance. Interestingly, direct regulation of v-ATPase genes by human MITF also occurs in cells of the melanocytic lineage, showing mechanistic conservation in the regulation of the v-ATPase by MITF-TFE proteins in fly and mammals. Collectively, this evidence points to an ancient Mitf/v-ATPase/TORC1 module that serves as a dynamic modulator of metabolism for cellular homeostasis.


McLaughlin H, Ceyhan-Birsoy O, Christensen K, Kohane I, Krier J, Lane W, Lautenbach D, Lebo M, Machini K, MacRae C, et al. A systematic approach to the reporting of medically relevant findings from whole genome sequencing. BMC Med Genet. 2014;15:134. doi:10.1186/s12881-014-0134-1
BACKGROUND: The MedSeq Project is a randomized clinical trial developing approaches to assess the impact of integrating genome sequencing into clinical medicine. To facilitate the return of results of potential medical relevance to physicians and patients participating in the MedSeq Project, we sought to develop a reporting approach for the effective communication of such findings. METHODS: Genome sequencing was performed on the Illumina HiSeq platform. Variants were filtered, interpreted, and validated according to methods developed by the Laboratory for Molecular Medicine and consistent with current professional guidelines. The GeneInsight software suite, which is integrated with the Partners HealthCare electronic health record, was used for variant curation, report drafting, and delivery. RESULTS: We developed a concise 5-6 page Genome Report (GR) featuring a single-page summary of results of potential medical relevance with additional pages containing structured variant, gene, and disease information along with supporting evidence for reported variants and brief descriptions of associated diseases and clinical implications. The GR is formatted to provide a succinct summary of genomic findings, enabling physicians to take appropriate steps for disease diagnosis, prevention, and management in their patients. CONCLUSIONS: Our experience highlights important considerations for the reporting of results of potential medical relevance and provides a framework for interpretation and reporting practices in clinical genome sequencing.
Vassy, Lautenbach, McLaughlin, Kong, Christensen, Krier, Kohane, Feuerman, Blumenthal-Barby, Roberts, et al. The MedSeq Project: a randomized trial of integrating whole genome sequencing into clinical medicine. TrialsTrialsTrials. 2014;15:85.
BACKGROUND: Whole genome sequencing (WGS) is already being used in certain clinical and research settings, but its impact on patient well-being, health-care utilization, and clinical decision-making remains largely unstudied. It is also unknown how best to communicate sequencing results to physicians and patients to improve health. We describe the design of the MedSeq Project: the first randomized trials of WGS in clinical care. METHODS/DESIGN: This pair of randomized controlled trials compares WGS to standard of care in two clinical contexts: (a) disease-specific genomic medicine in a cardiomyopathy clinic and (b) general genomic medicine in primary care. We are recruiting 8 to 12 cardiologists, 8 to 12 primary care physicians, and approximately 200 of their patients. Patient participants in both the cardiology and primary care trials are randomly assigned to receive a family history assessment with or without WGS. Our laboratory delivers a genome report to physician participants that balances the needs to enhance understandability of genomic information and to convey its complexity. We provide an educational curriculum for physician participants and offer them a hotline to genetics professionals for guidance in interpreting and managing their patients' genome reports. Using varied data sources, including surveys, semi-structured interviews, and review of clinical data, we measure the attitudes, behaviors and outcomes of physician and patient participants at multiple time points before and after the disclosure of these results. DISCUSSION: The impact of emerging sequencing technologies on patient care is unclear. We have designed a process of interpreting WGS results and delivering them to physicians in a way that anticipates how we envision genomic medicine will evolve in the near future. That is, our WGS report provides clinically relevant information while communicating the complexity and uncertainty of WGS results to physicians and, through physicians, to their patients. This project will not only illuminate the impact of integrating genomic medicine into the clinical care of patients but also inform the design of future studies. TRIAL REGISTRATION: identifier NCT01736566.
Lee, Lee, Hsing, Choe, Park, Kim, Bohn, Neu, Hwang, Green, et al. Prioritizing disease-linked variants, genes, and pathways with an interactive whole-genome analysis pipeline. Hum MutatHum MutatHum Mutat. 2014;35:537–47.
Whole-genome sequencing (WGS) studies are uncovering disease-associated variants in both rare and nonrare diseases. Utilizing the next-generation sequencing for WGS requires a series of computational methods for alignment, variant detection, and annotation, and the accuracy and reproducibility of annotation results are essential for clinical implementation. However, annotating WGS with up to date genomic information is still challenging for biomedical researchers. Here, we present one of the fastest and highly scalable annotation, filtering, and analysis pipeline-gNOME-to prioritize phenotype-associated variants while minimizing false-positive findings. Intuitive graphical user interface of gNOME facilitates the selection of phenotype-associated variants, and the result summaries are provided at variant, gene, and genome levels. Moreover, the enrichment results of specific variants, genes, and gene sets between two groups or compared with population scale WGS datasets that is already integrated in the pipeline can help the interpretation. We found a small number of discordant results between annotation software tools in part due to different reporting strategies for the variants with complex impacts. Using two published whole-exome datasets of uveal melanoma and bladder cancer, we demonstrated gNOME's accuracy of variant annotation and the enrichment of loss-of-function variants in known cancer pathways. gNOME Web server and source codes are freely available to the academic community (