Publications

2016

Brown A, Kong SW, Kohane I, Patel C. ksRepo: a generalized platform for computational drug repositioning.. BMC Bioinformatics. 2016;17(1):78. doi:10.1186/s12859-016-0931-y

BACKGROUND: Repositioning approved drug and small molecules in novel therapeutic areas is of key interest to the pharmaceutical industry. A number of promising computational techniques have been developed to aid in repositioning, however, the majority of available methodologies require highly specific data inputs that preclude the use of many datasets and databases. There is a clear unmet need for a generalized methodology that enables the integration of multiple types of both gene expression data and database schema. RESULTS: ksRepo eliminates the need for a single microarray platform as input and allows for the use of a variety of drug and chemical exposure databases. We tested ksRepo's performance on a set of five prostate cancer datasets using the Comparative Toxicogenomics Database (CTD) as our database of gene-compound interactions. ksRepo successfully predicted significance for five frontline prostate cancer therapies, representing a significant enrichment from over 7000 CTD compounds, and achieved specificity similar to other repositioning methods. CONCLUSIONS: We present ksRepo, which enables investigators to use any data inputs for computational drug repositioning. ksRepo is implemented in a series of four functions in the R statistical environment under a BSD3 license. Source code is freely available at http://github.com/adam-sam-brown/ksRepo . A vignette is provided to aid users in performing ksRepo analysis.

Castro, Kong, Clements, Brady, Kaimal, Doyle, Robinson, Churchill, Kohane, Perlis. Absence of evidence for increase in risk for autism or attention-deficit hyperactivity disorder following antidepressant exposure during pregnancy: a replication study.. Transl Psychiatry. 2016;6:e708. doi:10.1038/tp.2015.190

Multiple studies have examined the risk of prenatal antidepressant exposure and risk for autism spectrum disorder (ASD) or attention-deficit hyperactivity disorder (ADHD), with inconsistent results. Precisely estimating such risk, if any, is of great importance in light of the need to balance such risk with the benefit of depression and anxiety treatment. We developed a method to integrate data from multiple New England health systems, matching offspring and maternal health data in electronic health records to characterize diagnoses and medication exposure. Children with ASD or ADHD were matched 1:3 with children without neurodevelopmental disorders. Association between maternal antidepressant exposure and ASD or ADHD liability was examined using logistic regression, adjusting for potential sociodemographic and psychiatric confounding variables. In new cohorts of 1245 ASD cases and 1701 ADHD cases, along with age-, sex- and socioeconomic status matched controls, neither disorder was significantly associated with prenatal antidepressant exposure in crude or adjusted models (adjusted odds ratio 0.90, 95% confidence interval 0.50-1.54 for ASD; 0.97, 95% confidence interval 0.53-1.69 for ADHD). Pre-pregnancy antidepressant exposure significantly increased risk for both disorders. These results suggest that prior reports of association between prenatal antidepressant exposure and neurodevelopmental disease are likely to represent a false-positive finding, which may arise in part through confounding by indication. They further demonstrate the potential to integrate data across electronic health records studies spanning multiple health systems to enable efficient pharmacovigilance investigation.

2015

Seok H-S, Song T, Kong SW, Hwang K-B. An Efficient Search Algorithm for Finding Genomic-Range Overlaps Based on the Maximum Range Length.. IEEE/ACM Trans Comput Biol Bioinform. 2015;12(4):778–84. doi:10.1109/TCBB.2014.2369042

Efficient search algorithms for finding genomic-range overlaps are essential for various bioinformatics applications. A majority of fast algorithms for searching the overlaps between a query range (e.g., a genomic variant) and a set of N reference ranges (e.g., exons) has time complexity of O(k + logN), where kdenotes a term related to the length and location of the reference ranges. Here, we present a simple but efficient algorithm that reduces k, based on the maximum reference range length. Specifically, for a given query range and the maximum reference range length, the proposed method divides the reference range set into three subsets: always, potentially, and never overlapping. Therefore, search effort can be reduced by excluding never overlapping subset. We demonstrate that the running time of the proposed algorithm is proportional to potentially overlapping subset size, that is proportional to the maximum reference range length if all the other conditions are the same. Moreover, an implementation of our algorithm was 13.8 to 30.0 percent faster than one of the fastest range search methods available when tested on various genomic-range data sets. The proposed algorithm has been incorporated into a disease-linked variant prioritization pipeline for WGS (http://gnome.tchlab.org) and its implementation is available at http://ml.ssu.ac.kr/gSearch.

Prilutsky D, Kho A, Palmer N, Bhakar A, Smedemark-Margulies N, Kong SW, Margulies D, Bear M, Kohane I. Gene expression analysis in Fmr1KO mice identifies an immunological signature in brain tissue and mGluR5-related signaling in primary neuronal cultures.. Mol Autism. 2015;6:66. doi:10.1186/s13229-015-0061-9

BACKGROUND: Fragile X syndrome (FXS) is a neurodevelopmental disorder whose biochemical manifestations involve dysregulation of mGluR5-dependent pathways, which are widely modeled using cultured neurons. In vitro phenotypes in cultured neurons using standard morphological, functional, and chemical approaches have demonstrated considerable variability. Here, we study transcriptomes obtained in situ in the intact brain tissues of a murine model of FXS to see how they reflect the in vitro state. METHODS: We used genome-wide mRNA expression profiling as a robust characterization tool for studying differentially expressed pathways in fragile X mental retardation 1 (Fmr1) knockout (KO) and wild-type (WT) murine primary neuronal cultures and in embryonic hippocampal and cortical murine tissue. To study the developmental trajectory and to relate mouse model data to human data, we used an expression map of human development to plot murine differentially expressed genes in KO/WT cultures and brain. RESULTS: We found that transcriptomes from cell cultures showed a stronger signature of Fmr1KO than whole tissue transcriptomes. We observed an over-representation of immunological signaling pathways in embryonic Fmr1KO cortical and hippocampal tissues and over-represented mGluR5-downstream signaling pathways in Fmr1KO cortical and hippocampal primary cultures. Genes whose expression was up-regulated in Fmr1KO murine cultures tended to peak early in human development, whereas differentially expressed genes in embryonic cortical and hippocampal tissues clustered with genes expressed later in human development. CONCLUSIONS: The transcriptional profile in brain tissues primarily centered on immunological mechanisms, whereas the profiles from cell cultures showed defects in neuronal activity. We speculate that the isolation and culturing of neurons caused a shift in neurological transcriptome towards a "juvenile" or "de-differentiated" state. Moreover, cultured neurons lack the close coupling with glia that might be responsible for the immunological phenotype in the intact brain. Our results suggest that cultured cells may recapitulate an early phase of the disease, which is also less obscured with a consequent "immunological" phenotype and in vivo compensatory mechanisms observed in the embryonic brain. Together, these results suggest that the transcriptome of cultured primary neuronal cells, in comparison to whole brain tissue, more robustly demonstrated the difference between Fmr1KO and WT mice and might reveal a molecular phenotype, which is typically hidden by compensatory mechanisms present in vivo. Moreover, cultures might be useful for investigating the perturbed pathways in early human brain development and genes previously implicated in autism.

Zhang, Zhou, Ogmundsdottir, Möller, Siddaway, Larue, Hsing, Kong, Goding, Palsson, et al. Mitf is a master regulator of the v-ATPase forming an Mitf/v-ATPase/TORC1 control module for cellular homeostasis.. J Cell Sci. 2015. doi:10.1242/jcs.173807

The v-ATPase is a fundamental eukaryotic enzyme central to cellular homeostasis. Although its impact on key metabolic regulators such as TORC1 is well-documented, our knowledge of mechanisms that regulate v-ATPase activity is limited. Here, we report that the Drosophila transcription factor Mitf is a master regulator of this holoenzyme. Mitf directly controls transcription of all 15 v-ATPase components through M-box cis-sites and this coordinated regulation impacts holoenzyme activity in vivo. In addition, through the v-ATPase, Mitf promotes the activity of TORC1, which in turn negatively regulates Mitf. We provide evidence that Mitf, v-ATPase and TORC1 form a negative regulatory loop that maintains each of these important metabolic regulators in relative balance. Interestingly, direct regulation of v-ATPase genes by human MITF also occurs in cells of the melanocytic lineage, showing mechanistic conservation in the regulation of the v-ATPase by MITF-TFE proteins in fly and mammals. Collectively, this evidence points to an ancient Mitf/v-ATPase/TORC1 module that serves as a dynamic modulator of metabolism for cellular homeostasis.

Park J-H, Cho B, Kwon H, Prilutsky D, Yun JM, Choi HC, Hwang K-B, Lee I-H, Kim J-I, Kong SW. I148M variant in PNPLA3 reduces central adiposity and metabolic disease risks while increasing nonalcoholic fatty liver disease.. Liver Int. 2015. doi:10.1111/liv.12909

BACKGROUNDS & AIMS: The I148M variant due to the substitution of C to G in PNPLA3 (rs738409) is associated with the increased risk of nonalcoholic fatty liver disease (NAFLD). In liver, I148M variant reduces hydrolytic function of PNPLA3, which results in hepatic steatosis; however, its association with the other clinical phenotype such as adiposity and metabolic diseases is not well established. METHODS: To identify the impact of I148M variant on clinical risk factors of NAFLD, we recruited 1,363 generally healthy Korean males after excluding alcoholic and secondary causes of hepatic steatosis. Central adiposity was assessed by computed tomography, and hepatic steatosis was evaluated by abdominal ultrasonography. RESULTS: The participants were predominantly middle-aged (49.0 ±7.1 years; range 30-60 years), and the frequency of NAFLD was 44.2%. The rs738409-G allele carriers had a 1.19-fold increased risk for NAFLD (minor allele frequency 0.43; allelic odds ratio 1.38; P = 4.3×10(-5) ). Interestingly, the rs738409 GG carriers showed significantly lower levels of visceral and subcutaneous adiposity (P 0.001 and = 0.015, respectively), BMI (P 0.001), triglycerides (P 0.001), and insulin resistance (P = 0.002) compared to CC carriers. These negative associations between clinical risk factors and rs738409-G dosage were more prominent in non-NAFLD group compared to those in NAFLD group. CONCLUSIONS: The I148M variant, although increasing the risk of NAFLD, was associated with reduced levels of central adiposity, BMI, serum triglycerides, and insulin resistance, suggesting differential roles in fat storage and distribution according to cell types and metabolic status. This article is protected by copyright. All rights reserved.

2014

McLaughlin H, Ceyhan-Birsoy O, Christensen K, Kohane I, Krier J, Lane W, Lautenbach D, Lebo M, Machini K, MacRae C, et al. A systematic approach to the reporting of medically relevant findings from whole genome sequencing.. BMC Med Genet. 2014;15:134. doi:10.1186/s12881-014-0134-1

BACKGROUND: The MedSeq Project is a randomized clinical trial developing approaches to assess the impact of integrating genome sequencing into clinical medicine. To facilitate the return of results of potential medical relevance to physicians and patients participating in the MedSeq Project, we sought to develop a reporting approach for the effective communication of such findings. METHODS: Genome sequencing was performed on the Illumina HiSeq platform. Variants were filtered, interpreted, and validated according to methods developed by the Laboratory for Molecular Medicine and consistent with current professional guidelines. The GeneInsight software suite, which is integrated with the Partners HealthCare electronic health record, was used for variant curation, report drafting, and delivery. RESULTS: We developed a concise 5-6 page Genome Report (GR) featuring a single-page summary of results of potential medical relevance with additional pages containing structured variant, gene, and disease information along with supporting evidence for reported variants and brief descriptions of associated diseases and clinical implications. The GR is formatted to provide a succinct summary of genomic findings, enabling physicians to take appropriate steps for disease diagnosis, prevention, and management in their patients. CONCLUSIONS: Our experience highlights important considerations for the reporting of results of potential medical relevance and provides a framework for interpretation and reporting practices in clinical genome sequencing.

Seok, Song, Kong, Hwang. An Efficient Search Algorithm for Finding Genomic-range Overlaps Based on the Maximum Range Length. IEEE/ACM Transactions on Computational Biology and BioinformaticsIEEE/ACM Transactions on Computational Biology and Bioinformatics. 2014;PrePrints.

Jarvik, Amendola, Berg, Brothers, Clayton, Chung, Evans, Evans, Fullerton, Gallego, et al. Return of genomic results to research participants: the floor, the ceiling, and the choices in between. Am J Hum GenetAm J Hum GenetAm J Hum Genet. 2014;94:818–26.

As more research studies incorporate next-generation sequencing (including whole-genome or whole-exome sequencing), investigators and institutional review boards face difficult questions regarding which genomic results to return to research participants and how. An American College of Medical Genetics and Genomics 2013 policy paper suggesting that pathogenic mutations in 56 specified genes should be returned in the clinical setting has raised the question of whether comparable recommendations should be considered in research settings. The Clinical Sequencing Exploratory Research (CSER) Consortium and the Electronic Medical Records and Genomics (eMERGE) Network are multisite research programs that aim to develop practical strategies for addressing questions concerning the return of results in genomic research. CSER and eMERGE committees have identified areas of consensus regarding the return of genomic results to research participants. In most circumstances, if results meet an actionability threshold for return and the research participant has consented to return, genomic results, along with referral for appropriate clinical follow-up, should be offered to participants. However, participants have a right to decline the receipt of genomic results, even when doing so might be viewed as a threat to the participants' health. Research investigators should be prepared to return research results and incidental findings discovered in the course of their research and meeting an actionability threshold, but they have no ethical obligation to actively search for such results. These positions are consistent with the recognition that clinical research is distinct from medical care in both its aims and its guiding moral principles.

Kong, Lee, Leshchiner, Krier, Kraft, Rehm, Green, Kohane, MacRae. Summarizing polygenic risks for complex diseases in a clinical whole-genome report. Genet MedGenet MedGenet Med. 2014.

Purpose:Disease-causing mutations and pharmacogenomic variants are of primary interest for clinical whole-genome sequencing. However, estimating genetic liability for common complex diseases using established risk alleles might one day prove clinically useful.Methods:We compared polygenic scoring methods using a case-control data set with independently discovered risk alleles in the MedSeq Project. For eight traits of clinical relevance in both the primary-care and cardiomyopathy study cohorts, we estimated multiplicative polygenic risk scores using 161 published risk alleles and then normalized them using the population median estimated from the 1000 Genomes Project.Results:Our polygenic score approach identified the overrepresentation of independently discovered risk alleles in cases as compared with controls using a large-scale genome-wide association study data set. In addition to normalized multiplicative polygenic risk scores and rank in a population, the disease prevalence and proportion of heritability explained by known common risk variants provide important context in the interpretation of modern multilocus disease risk models.Conclusion:Our approach in the MedSeq Project demonstrates how complex trait risk variants from an individual genome can be summarized and reported for the general clinician and also highlights the need for definitive clinical studies to obtain reference data for such estimates and to establish clinical utility.Genet Med advance online publication 23 October 2014Genetics in Medicine (2014); doi:10.1038/gim.2014.143.

Sek Won Kong

Publications

2016

2015

2014