by Shelia Hoar Zahm, Sc.D.
Over the past five years, DCEG has served as a leader in conducting genome-wide association studies (GWAS) that have identified common inherited genetic variants associated with cancer risk. The Division also has developed statistical methods to analyze the extensive data generated by these studies. Three tenure-track investigators ensure that DCEG will continue to excel in developing sound approaches for complex analyses of large studies, rare genetic variants, copy number variation (CNV), gene expression, and DNA methylation as well as studies of metabolomics and microbiomics.
Joshua Sampson, Ph.D., joined the Biostatistics Branch (BB) in 2009 after receiving his doctorate from the University of Washington in Seattle and completing a postdoctoral fellowship at Yale University in New Haven, Connecticut. During his training, he focused on improving methods for identifying genetic variants that affect multiple traits and for assigning genotypes to single nucleotide polymorphisms (SNPs), a process referred to as “calling.” Since joining DCEG, Dr. Sampson has introduced new study designs to improve the efficiency of next-generation sequencing, developed a testing framework for identifying associations with groups of rare SNPs, and designed new methods for analyzing subgroup-specific genetic effects. Dr. Sampson is applying the latter methods in an exciting new GWAS, led by DCEG and the Childhood Cancer Survivor Study collaborators, which evaluates the risk of secondary neoplasms in childhood cancer survivors. The multiplicity of childhood and secondary cancers and potential interactions with the original cancer treatments complicate the researchers’ efforts to identify genetic effects.
Dr. Sampson’s research extends beyond GWAS. For example, he and colleagues in the Nutritional Epidemiology Branch are leading an initiative to identify metabolites and metabolomic profiles that can be used as biomarkers of lifestyle-related exposures or as prospective markers of cancer risk. This work involves complex statistical challenges. Dr. Sampson explained, “In genetics, when trying to decompose an association into a cause and an effect, we can make certain assumptions, like the disease not being able to cause mutations in germline DNA. In metabolomics, however, we need to determine whether an associated metabolite is a symptom of the disease, a biomarker associated with a causal factor, a causal factor, or chance.”
Dr. Sampson noted, “I came to DCEG because the Division includes the biostatistician as an integral member of the scientific team and because the Biostatistics Branch offers a supportive, collegial environment that fosters high-quality collaborative work.”
Jianxin Shi, Ph.D., also joined BB in 2009 with an interest in developing statistical approaches for studying the role of genetics in cancer etiology and survival. During his doctoral training and postdoctoral fellowship at Stanford University in Palo Alto, California, Dr. Shi worked on GWAS of breast cancer and psychiatric disorders. At DCEG, he focuses on developing methods for detection and association testing of CNVs based on GWAS genotyping platforms. Although researchers can accurately call SNPs based on genotypes in GWAS, the process of accurately detecting CNVs poses much more difficulty because of the need to use intensity to infer the CNVs. “Short CNVs can be particularly difficult to call accurately,” Dr. Shi explained. “We have been trying to integrate information across unrelated subjects and family members to improve the accuracy of CNV detection and the power of association testing.”
Dr. Shi is also interested in evaluating the role of genetics in cancer survival. Many studies show that the genomes of tumors influence survival, but researchers do not know whether a patient’s own genome affects survival. Dr. Shi has developed a method for estimating genetic heritability of cancer survival using GWAS data. A recent analysis found almost no contribution of common germline genetic variation to survival after lung cancer. “We plan to expand this work to other cancers,” Dr. Shi stated. “It may be that only rare variants influence survival.”
Collaborating with Neil E. Caporaso, M.D., Chief of the Genetic Epidemiology Branch (GEB), and Maria Teresa Landi, M.D., Ph.D. (GEB), who lead the Environment And Genetics in Lung cancer Etiology (EAGLE) study, Dr. Shi works on integrating the study’s diverse data. These data include information on GWAS SNPs, exome sequencing, gene expression, methylation, mRNA, and microRNA. The team’s goal is to provide a comprehensive molecular characterization of lung cancer tumors. The analysis also will use data from The Cancer Genome Atlas project.
As if human and tumor genetics were not complicated enough, Dr. Shi is developing methods for analyzing studies of the microbiome (i.e., all the microorganisms and bacteria that live in or on the human body) in relation to cancer. Researchers must consider many layers of data, including species, genus, alpha and beta diversity, and other features, any one of which could be important. Dr. Shi noted, “I love working on the variety of projects here in DCEG. The important scientific questions motivate me to develop new analytic techniques to find answers.”
Sonja I. Berndt, Pharm.D., Ph.D., an investigator in the Occupational and Environmental Epidemiology Branch (OEEB), works with DCEG biostatisticians to refine and apply new methods for analyzing genetic data in her research. While training for a doctorate of pharmacy at the University of Michigan in Ann Arbor, Dr. Berndt first encountered epidemiology and found that she preferred the epidemiology field to her pharmacy studies. She subsequently decided to pursue an additional doctorate in epidemiology at the Johns Hopkins Bloomberg School of Public Health in Baltimore, Maryland. At John Hopkins, she was referred to Dr. Richard Hayes, formerly of OEEB, who mentored her thesis research on genetic variation related to colorectal neoplasia.
As a DCEG tenure-track investigator, Dr. Berndt has worked on many GWAS. Several of these projects have had interesting features that have prompted her to work with Dr. Sampson, Nilanjan Chatterjee, Ph.D., Chief of BB, and other scientists to develop new methods for analysis. She is currently working on a GWAS of non-Hodgkin lymphoma (NHL) with mentors Nathaniel Rothman, M.D., M.P.H., M.H.S. (OEEB), and Stephen J. Chanock, M.D., Chief of the Laboratory of Translational Genomics (LTG) and Director of the Cancer Genomics Research Laboratory. “The comparison of the genetic architecture across NHL subtypes is perhaps the most interesting part of this study,” Dr. Berndt noted. To gain information on more variants while sparing the expense of fine mapping, she and her mentors are expanding the scan data by imputing genotypes from data generated by the 1000 Genomes Project.
In collaboration with Laufey Amundadottir, Ph.D. (LTG), Sarah E. Daugherty, Ph.D., M.P.H., Hormonal and Reproductive Epidemiology Branch, and Dr. Sampson, Dr. Berndt has been investigating the genetic determinants of prostate-specific antigen (PSA) values based on a GWAS of data from the Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial. In this study, there were six PSA measurements per study subject, and the analyses focus on the baseline value, the average of the six measures per subject, and the PSA velocity (the rate at which the PSA level increases from year to year, across the six measurements). Dr. Sampson’s expertise has been invaluable in analyzing these complex outcome variables.
Within the Genetic Investigation of Anthropometric Traits Consortium, Dr. Berndt studies genetic variants related to obesity, height, and other anthropometric traits. The consortium has not only provided valuable insight into the genetic determinants of these important cancer risk factors but it also has contributed to a greater understanding of the genetic architecture of complex disease and traits as a whole. More data are available for obesity and other anthropometric variables than for cancer, so these areas offer more opportunities to develop new statistical methods that researchers can later apply to cancer studies.
The biggest challenge to these investigators’ research is the fiscal constraints facing NCI. “While the cost of genotyping has decreased over time, the costs for data handling and storage and computing needs of such large datasets are substantial. Our computing and storage capacity are no longer sufficient for the extensive data in our genetic studies.” said Dr. Berndt. “We will need to develop greater infrastructure for computing or change the way we analyze and store data so that we will be able to continue the progress we are making in understanding the role of genetics in cancer etiology.”