Publications

2014

Kong, Lee, Leshchiner, Krier, Kraft, Rehm, Green, Kohane, MacRae. Summarizing polygenic risks for complex diseases in a clinical whole-genome report. Genet MedGenet MedGenet Med. 2014.

Purpose:Disease-causing mutations and pharmacogenomic variants are of primary interest for clinical whole-genome sequencing. However, estimating genetic liability for common complex diseases using established risk alleles might one day prove clinically useful.Methods:We compared polygenic scoring methods using a case-control data set with independently discovered risk alleles in the MedSeq Project. For eight traits of clinical relevance in both the primary-care and cardiomyopathy study cohorts, we estimated multiplicative polygenic risk scores using 161 published risk alleles and then normalized them using the population median estimated from the 1000 Genomes Project.Results:Our polygenic score approach identified the overrepresentation of independently discovered risk alleles in cases as compared with controls using a large-scale genome-wide association study data set. In addition to normalized multiplicative polygenic risk scores and rank in a population, the disease prevalence and proportion of heritability explained by known common risk variants provide important context in the interpretation of modern multilocus disease risk models.Conclusion:Our approach in the MedSeq Project demonstrates how complex trait risk variants from an individual genome can be summarized and reported for the general clinician and also highlights the need for definitive clinical studies to obtain reference data for such estimates and to establish clinical utility.Genet Med advance online publication 23 October 2014Genetics in Medicine (2014); doi:10.1038/gim.2014.143.

Kong, Sahin, Collins, Wertz, Campbell, Leech, Krueger, Bear, Kunkel, Kohane. Divergent dysregulation of gene expression in murine models of fragile X syndrome and tuberous sclerosis. Mol AutismMol AutismMol Autism. 2014;5:16.

BACKGROUND: Fragile X syndrome and tuberous sclerosis are genetic syndromes that both have a high rate of comorbidity with autism spectrum disorder (ASD). Several lines of evidence suggest that these two monogenic disorders may converge at a molecular level through the dysfunction of activity-dependent synaptic plasticity. METHODS: To explore the characteristics of transcriptomic changes in these monogenic disorders, we profiled genome-wide gene expression levels in cerebellum and blood from murine models of fragile X syndrome and tuberous sclerosis. RESULTS: Differentially expressed genes and enriched pathways were distinct for the two murine models examined, with the exception of immune response-related pathways. In the cerebellum of the Fmr1 knockout (Fmr1-KO) model, the neuroactive ligand receptor interaction pathway and gene sets associated with synaptic plasticity such as long-term potentiation, gap junction, and axon guidance were the most significantly perturbed pathways. The phosphatidylinositol signaling pathway was significantly dysregulated in both cerebellum and blood of Fmr1-KO mice. In Tsc2 heterozygous (+/-) mice, immune system-related pathways, genes encoding ribosomal proteins, and glycolipid metabolism pathways were significantly changed in both tissues. CONCLUSIONS: Our data suggest that distinct molecular pathways may be involved in ASD with known but different genetic causes and that blood gene expression profiles of Fmr1-KO and Tsc2+/- mice mirror some, but not all, of the perturbed molecular pathways in the brain.

Seok, Song, Kong, Hwang. An Efficient Search Algorithm for Finding Genomic-range Overlaps Based on the Maximum Range Length. IEEE/ACM Transactions on Computational Biology and BioinformaticsIEEE/ACM Transactions on Computational Biology and Bioinformatics. 2014;PrePrints.

Vassy, Lautenbach, McLaughlin, Kong, Christensen, Krier, Kohane, Feuerman, Blumenthal-Barby, Roberts, et al. The MedSeq Project: a randomized trial of integrating whole genome sequencing into clinical medicine. TrialsTrialsTrials. 2014;15:85.

BACKGROUND: Whole genome sequencing (WGS) is already being used in certain clinical and research settings, but its impact on patient well-being, health-care utilization, and clinical decision-making remains largely unstudied. It is also unknown how best to communicate sequencing results to physicians and patients to improve health. We describe the design of the MedSeq Project: the first randomized trials of WGS in clinical care. METHODS/DESIGN: This pair of randomized controlled trials compares WGS to standard of care in two clinical contexts: (a) disease-specific genomic medicine in a cardiomyopathy clinic and (b) general genomic medicine in primary care. We are recruiting 8 to 12 cardiologists, 8 to 12 primary care physicians, and approximately 200 of their patients. Patient participants in both the cardiology and primary care trials are randomly assigned to receive a family history assessment with or without WGS. Our laboratory delivers a genome report to physician participants that balances the needs to enhance understandability of genomic information and to convey its complexity. We provide an educational curriculum for physician participants and offer them a hotline to genetics professionals for guidance in interpreting and managing their patients' genome reports. Using varied data sources, including surveys, semi-structured interviews, and review of clinical data, we measure the attitudes, behaviors and outcomes of physician and patient participants at multiple time points before and after the disclosure of these results. DISCUSSION: The impact of emerging sequencing technologies on patient care is unclear. We have designed a process of interpreting WGS results and delivering them to physicians in a way that anticipates how we envision genomic medicine will evolve in the near future. That is, our WGS report provides clinically relevant information while communicating the complexity and uncertainty of WGS results to physicians and, through physicians, to their patients. This project will not only illuminate the impact of integrating genomic medicine into the clinical care of patients but also inform the design of future studies. TRIAL REGISTRATION: ClinicalTrials.gov identifier NCT01736566.

2013

Campbell, Kohane, Kong. Pathway-based outlier method reveals heterogeneous genomic structure of autism in blood transcriptome. BMC Med GenomicsBMC Med GenomicsBMC Med Genomics. 2013;6:34.

BACKGROUND: Decades of research strongly suggest that the genetic etiology of autism spectrum disorders (ASDs) is heterogeneous. However, most published studies focus on group differences between cases and controls. In contrast, we hypothesized that the heterogeneity of the disorder could be characterized by identifying pathways for which individuals are outliers rather than pathways representative of shared group differences of the ASD diagnosis. METHODS: Two previously published blood gene expression data sets--the Translational Genetics Research Institute (TGen) dataset (70 cases and 60 unrelated controls) and the Simons Simplex Consortium (Simons) dataset (221 probands and 191 unaffected family members)--were analyzed. All individuals of each dataset were projected to biological pathways, and each sample's Mahalanobis distance from a pooled centroid was calculated to compare the number of case and control outliers for each pathway. RESULTS: Analysis of a set of blood gene expression profiles from 70 ASD and 60 unrelated controls revealed three pathways whose outliers were significantly overrepresented in the ASD cases: neuron development including axonogenesis and neurite development (29% of ASD, 3% of control), nitric oxide signaling (29%, 3%), and skeletal development (27%, 3%). Overall, 50% of cases and 8% of controls were outliers in one of these three pathways, which could not be identified using group comparison or gene-level outlier methods. In an independently collected data set consisting of 221 ASD and 191 unaffected family members, outliers in the neurogenesis pathway were heavily biased towards cases (20.8% of ASD, 12.0% of control). Interestingly, neurogenesis outliers were more common among unaffected family members (Simons) than unrelated controls (TGen), but the statistical significance of this effect was marginal (Chi squared P 0.09). CONCLUSIONS: Unlike group difference approaches, our analysis identified the samples within the case and control groups that manifested each expression signal, and showed that outlier groups were distinct for each implicated pathway. Moreover, our results suggest that by seeking heterogeneity, pathway-based outlier analysis can reveal expression signals that are not apparent when considering only shared group differences.

Larman, Salajegheh, Nazareno, Lam, Sauld, Steen, Kong, Pinkus, Amato, Elledge, et al. Cytosolic 5’-nucleotidase 1A autoimmunity in sporadic inclusion body myositis. Ann NeurolAnn NeurolAnn Neurol. 2013;73:408–18.

OBJECTIVE: We previously identified a circulating autoantibody against a 43 kDa muscle autoantigen in sporadic inclusion body myositis (IBM) and demonstrated the feasibility of an IBM diagnostic blood test. Here, we sought to identify the molecular target of this IBM autoantibody, understand the relationship between IBM autoimmunity and muscle degeneration, and develop an IBM blood test with high diagnostic accuracy. METHODS: IBM blood samples were screened using mass spectrometry and a synthetic human peptidome. Plasma and serum samples (N=200 patients) underwent immunoblotting assays, and results were correlated to clinical features. Muscle biopsy samples (n=30) were examined by immunohistochemistry and immunoblotting. Exome or whole genome sequencing was performed on DNA from 19 patients. RESULTS: Both mass spectrometry and screening of a 413,611 human peptide library spanning the entire human proteome identified cytosolic 5'-nucleotidase 1A (cN1A; NT5C1A) as the likely 43 kDa IBM autoantigen, which was then confirmed in dot blot and Western blot assays using recombinant cN1A protein. Moderate reactivity of anti-cN1A autoantibodies was 70% sensitive and 92% specific, and high reactivity was 34% sensitive and 98% specific for the diagnosis of IBM. One to 3 major cN1A immunodominant epitopes were identified. cN1A reactivity by immunohistochemistry accumulated in perinuclear regions and rimmed vacuoles in IBM muscle, localizing to areas of myonuclear degeneration. INTERPRETATION: Autoantibodies against cN1A are common in and highly specific to IBM among muscle diseases, and may provide a link between IBM's dual processes of autoimmunity and myodegeneration. Blood diagnostic testing is feasible and should improve early and reliable diagnosis of IBM.

Kong, Shimizu-Motohashi, Campbell, Lee, Collins, Brewster, Holm, Rappaport, Kohane, Kunkel. Peripheral blood gene expression signature differentiates children with autism from unaffected siblings. NeurogeneticsNeurogeneticsNeurogenetics. 2013;14:143–52.

Autism spectrum disorder (ASD) is one of the most prevalent neurodevelopmental disorders with high heritability, yet a majority of genetic contribution to pathophysiology is not known. Siblings of individuals with ASD are at increased risk for ASD and autistic traits, but the genetic contribution for simplex families is estimated to be less when compared to multiplex families. To explore the genomic (dis-) similarity between proband and unaffected sibling in simplex families, we used genome-wide gene expression profiles of blood from 20 proband-unaffected sibling pairs and 18 unrelated control individuals. The global gene expression profiles of unaffected siblings were more similar to those from probands as they shared genetic and environmental background. A total of 189 genes were significantly differentially expressed between proband-sib pairs (nominal p 0.01) after controlling for age, sex, and family effects. Probands and siblings were distinguished into two groups by cluster analysis with these genes. Overall, unaffected siblings were equally distant from the centroid of probands and from that of unrelated controls with the differentially expressed genes. Interestingly, five of 20 siblings had gene expression profiles that were more similar to unrelated controls than to their matched probands. In summary, we found a set of genes that distinguished probands from the unaffected siblings, and a subgroup of unaffected siblings who were more similar to probands. The pathways that characterized probands compared to siblings using peripheral blood gene expression profiles were the up-regulation of ribosomal, spliceosomal, and mitochondrial pathways, and the down-regulation of neuroreceptor-ligand, immune response and calcium signaling pathways. Further integrative study with structural genetic variations such as de novo mutations, rare variants, and copy number variations would clarify whether these transcriptomic changes are structural or environmental in origin.

2012

Kohane, Hsing, Kong. Taxonomizing, sizing, and overcoming the incidentalome. Genet MedGenet MedGenet Med. 2012;14:399–404.

PURPOSE: With the advent of whole-genome sequencing made clinically available, the number of incidental findings is likely to rise. The false-positive incidental findings are of particular clinical concern. We provide estimates on the size of these false-positive findings and classify them into four broad categories. METHODS: Whole-genome sequences (WGS) of nine individuals were scanned with several comprehensive public annotation databases and average estimates for the number of findings. These estimates were then evaluated in the perspective of various sources of false-positive annotation errors. RESULTS: At present there are four main sources of false-positive incidental findings: erroneous annotations, sequencing error, incorrect penetrance estimates, and multiple hypothesis testing. Of these, the first two are likely to be addressed in the near term. Conservatively, current methods deliver hundreds of false-positive incidental findings per individual. CONCLUSION: The burden of false-positives in whole-genome sequence interpretation threatens current capabilities to deliver clinical-grade whole-genome clinical interpretation. A new generation of population studies and retooling of the clinical decision-support approach will be required to overcome this threat.

Song, Hwang, Hsing, Lee, Bohn, Kong. gSearch: a fast and flexible general search tool for whole-genome sequencing. BioinformaticsBioinformaticsBioinformatics. 2012;28:2176–7.

BACKGROUND: Various processes such as annotation and filtering of variants or comparison of variants in different genomes are required in whole-genome or exome analysis pipelines. However, processing different databases and searching among millions of genomic loci is not trivial. RESULTS: gSearch compares sequence variants in the Genome Variation Format (GVF) or Variant Call Format (VCF) with a pre-compiled annotation or with variants in other genomes. Its search algorithms are subsequently optimized and implemented in a multi-threaded manner. The proposed method is not a stand-alone annotation tool with its own reference databases. Rather, it is a search utility that readily accepts public or user-prepared reference files in various formats including GVF, Generic Feature Format version 3 (GFF3), Gene Transfer Format (GTF), VCF and Browser Extensible Data (BED) format. Compared to existing tools such as ANNOVAR, gSearch runs more than 10 times faster. For example, it is capable of annotating 52.8 million variants with allele frequencies in 6 min. AVAILABILITY: gSearch is available at http://ml.ssu.ac.kr/gSearch. It can be used as an independent search tool or can easily be integrated to existing pipelines through various programming environments such as Perl, Ruby and Python.

He, Ma, Cao, Gise, Zhou, Xie, Zhang, Hsing, Christodoulou, Cahan, et al. Polycomb repressive complex 2 regulates normal development of the mouse heart. Circ ResCirc ResCirc Res. 2012;110:406–15.

RATIONALE: Epigenetic marks are crucial for organogenesis, but their role in heart development is poorly understood. Polycomb repressive complex 2 (PRC2) trimethylates histone H3 at lysine 27, which establishes H3K27me3 repressive epigenetic marks that promote tissue-specific differentiation by silencing ectopic gene programs. OBJECTIVE: We studied the function of PRC2 in murine heart development using a tissue-restricted conditional inactivation strategy. METHODS AND RESULTS: Inactivation of the PRC2 subunit Ezh2 by Nkx2-5(Cre) (Ezh2(NK)) caused lethal congenital heart malformations, namely, compact myocardial hypoplasia, hypertrabeculation, and ventricular septal defect. Candidate and genome-wide RNA expression profiling and chromatin immunoprecipitation analyses of Ezh2(NK) heart identified genes directly repressed by EZH2. Among these were the potent cell cycle inhibitors Ink4a/b (inhibitors of cyclin-dependent kinase 4 A and B), the upregulation of which was associated with decreased cardiomyocyte proliferation in Ezh2(NK). EZH2-repressed genes were enriched for transcriptional regulators of noncardiomyocyte expression programs such as Pax6, Isl1, and Six1. EZH2 was also required for proper spatiotemporal regulation of cardiac gene expression, because Hcn4, Mlc2a, and Bmp10 were inappropriately upregulated in ventricular RNA. PRC2 was also required later in heart development, as indicated by cardiomyocyte-restricted TNT-Cre inactivation of the PRC2 subunit Eed. However, Ezh2 inactivation by TNT-Cre did not cause an overt phenotype, likely because of functional redundancy with Ezh1. Thus, early Ezh2 inactivation by Nk2-5(Cre) caused later disruption of cardiomyocyte gene expression and heart development. CONCLUSIONS: Our study reveals a previously undescribed role of EZH2 in regulating heart formation and shows that perturbation of the epigenetic landscape early in cardiogenesis has sustained disruptive effects at later developmental stages.

Sek Won Kong

Publications

2014

2013

2012