Research

Clinical whole genome sequencing

Population scaling genome sequencing such as > 1 million volunteers of the precision medicine initiative will highlight phenotype-associated genomic variants. Analytical validity of genome sequencing is not satisfactory yet, in part due to analytic pipelines.

1. False positives and false negatives in genomic variant calling:

Monoploid representation of the human reference genome is one of the largest sources of erroneous variant calling, especially for the individuals of non-European ancestry. Short read mapping and variant calling procedures add additional biases. Ensemble and/or joint variant calling approaches can be an immediate solution but with a significant cost of computation. We are sorting out the sources of inaccurate variant calling for whole genome sequencing to find current best approach. 

2. Predicting impact of genomic variants:

Population specific allele frequency of each variant is one of the most useful information in predicting variant impact. Such information should be effectively combined with the other sources of variant annotation. The gNOME pipeline enables to prioritize variants of interests using a graphical user interface. It also enables to compare genome(s) to a group of genomes (population).  A fast annotation engine – gSearch – makes to process hundreds to thousands of genomes in a reasonable amount of time.

3. Polygenic scores:

Genetic burden can be summarized to unitless genetic risk scores using multiple genetic variants by comparing an individual’s score to population norm. Such polygenic scores can be used to identify subgroups of individuals with a similar profile. A multivariate statistical approach can help to define subgroups according to their genomic profiles.

Biomarkers of autism spectrum disorder

Autism spectrum disorder (ASD) is a group of neurodevelopmental disorders with unknown etiology for the majority of cases. To identify diagnostic, treatment, and prognostic biomarkers, genomic profiling has been extensively used over last several years.

1. Blood gene expression profiles in ASD:

Several hundreds of genes are differentially expressed in the patients with ASD compared to age and gender matched normally developing children. No single gene is consistently up or down regulated in a majority of patients; however, 20-30% of patients share similar gene expression changes. Clinical characteristics do not seem to explain the different between genomically similar groups. We are interrogating the other factors such as sequence variation and metabolic profiles that contribute the clustering of patients with ASD.

2. Metabolomic characteristics of ASD:

Recent twin studies on heritability have acknowledged that shared environmental factors may explain a larger proportion of the variance in liability (41-52%) relative to heritability (38%-49%). Therefore, the variance explained by genetic factors may be equal, or less than, that by environmental risk factors. We hypothesize that environmental exposures contributing to ASD create an exposure history reflected in transcriptome and environmental metabolomics profiles. The transcriptome-metabolome interaction study therefore provides an opportunity to understand etiologic factors.