Skip to main content
An official website of the United States government
Haoyu Zhang, Earl Stadtman investigator in the Biostatistics Branch

Haoyu Zhang, Ph.D.

Earl Stadtman Investigator

NCI Shady Grove | Room 7E628


Haoyu Zhang, Ph.D., was appointed Earl Stadtman tenure-track investigator in the Biostatistics Branch (BB) in August 2022. He received his Ph.D. in biostatistics at Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland in 2019, and his B.S. in statistics from Zhejiang University in Hangzhou, China.

Visit Dr. Zhang on Github

Dr. Zhang focused his Ph.D. dissertation on testing for genetic association and building risk prediction models for cancer incorporating tumor characteristics. He also led several analyses through multidisciplinary, international collaborations using the largest breast cancer genome-wide association study (GWAS) dataset from the Breast Cancer Association Consortium (BCAC).

Prior to joining DCEG, Dr. Zhang received postdoctoral training at the Harvard T.H. Chan School of Public Health in Boston, MA, where he developed a robust Mendelian Randomization (MR) method to estimate causal effects in observational studies and a multi-ancestry polygenic risk score (PRS) method to improve predictions in underrepresented populations. He evaluated the multi-ancestry PRS approach on 5.1 million subjects using data from 23andMe, the Global Lipids Genetics Consortium, UK Biobank and All of Us. During his postdoctoral training, he received the K99/R00 Pathway to Independence Award from the National Cancer Institute.

Research Interests

As an Earl Stadtman investigator, Dr. Zhang’s research focuses on developing and applying scalable statistical methods for analyzing multi-ancestry genetic and biobank data and translating these findings to clinical settings to inform prevention and therapeutic strategies. He also collaborates across the Division to develop and apply advanced statistical methods to ongoing and upcoming DCEG research studies, including investigating the genetic architecture of complex traits and diseases, conducting multi-ancestry association testing, developing multi-ancestry PRS, and estimating heritability for the  Confluence Project—to date the largest and most diverse genome-wide association study of breast cancer. In addition, Dr. Zhang collaborates with several large consortia, including the Breast Cancer Association Consortium (BCAC); the International Lung Cancer Consortium (ILCCO); and the Polygenic Risk Method in Diverse Populations Consortium (PRIMED), an NIH-funded consortium aiming to develop and evaluate methods to improve the use of PRS to predict disease and health outcomes in diverse ancestry populations. In addition to his research, Dr. Zhang co-chairs the Simulation and Benchmarking Working Group in PRIMED.

In accordance with DCEG’s commitment to FAIR principles, the software Dr. Zhang develops for all projects will be open-access, user-friendly, and suitable for high-performance computing clusters and cloud platforms of the NIH Data Commons.

Dr. Zhang's Research Team

research is supported by a team of trainees and scientific staff.

Selected Software

Two-stage Polytomous Model (TOP)

TOP package implemented a two-stage polytomous regression framework to handle cancer data with multivariate tumor characteristics


The application allows rapid power and sample size analysis for a variety of genetic association tests by specification of a few key parameters.

Composite Kernel Machine Regression based on Likelihood Ratio Test (CKLRT)

Composite Kernel Machine Regression based on Likelihood Ratio Test (CKLRT) implemented a kernel machine regression framework to model the overall genetic effect of a SNP-set, considering the possible GE interaction


Press Contacts

To request an interview with NCI researchers, contact the NCI Office of Media Relations. | 240-760-6600