Skip to Content
Discovering the causes of cancer and the means of prevention

Statistical Analysis of High Dimensional Data in Molecular Epidemiology - Josh Sampson, Ph.D.

DCEG Seminar

DCEG Seminar


Joshua Sampson, Ph.D., Investigator, Biostatistics Branch

Division of Cancer Epidemiology and Genetics, National Cancer Institute


Dr. Sampson will discuss statistical methods for analyzing the high dimensional data often collected in molecular epidemiology, focusing on his recent work in metabolomics and genetics. He will start by showing that despite moderate day-to-day variability in metabolite levels, epidemiological studies, with only one sample per individual, should be sufficiently powered to detect metabolite-disease associations. He will then introduce the collection of metabolomic studies currently being conducted at DCEG, discuss the comprehensive analysis pipeline in place for detecting associations and describe novel methodology for finding metabolites that mediate relationships between known risk factors and diseases.

Dr. Sampson will then discuss methods for some of his research into maximizing the information collected in GWAS. He will first focus on a novel, cost-effective, two-platform study design for GWAS that genotypes all individuals on a low-coverage array and then supplements that data by genotyping only a small proportion of the participants on a dense array. Dr. Sampson will then describe his new chromosome-based Quasi Likelihood Score (cQLS) statistic that incorporates local Identity-By-Descent (IBD) to increase the power to detect associations in GWAS of related individuals.