Skip to Content
Discovering the causes of cancer and the means of prevention

Barry I. Graubard, Ph.D.

Senior Investigator

Information for Journalists

To request an interview with a DCEG investigator, contact the NCI Office of Media Relations:


Phone: 240-760-6600

Barry I. Graubard, Ph.D.

Barry I. Graubard, Ph.D.

Organization:National Cancer Institute
Division of Cancer Epidemiology & Genetics, Biostatistics Branch
Address:NCI Shady Grove
Room 7E140


Dr. Graubard received a Ph.D. in mathematics from the University of Maryland in 1991. He began his career as a mathematical statistician at the National Center for Health Statistics in 1977, and held research positions at the Alcohol Drug Abuse and Mental Health Administration and the National Institute of Child Health and Human Development. Dr. Graubard joined the NCI in 1990. He received the American Statistical Association and Biometric Society Snedecor Award for Applied Statistical Research in 1990, and he is a Fellow of the American Statistical Association.

Research Interests

National health survey data are used for many purposes by the NCI, including cancer surveillance as well as descriptive and analytical epidemiology studies. Surveys provide national and subgroup estimates of the prevalence of cancer risk factors, and subjects surveyed can be followed as nationally representative cohorts for estimating associations between risk factors and cancer incidence. When analyzing data from national surveys, attention needs to be given to their complex sample designs. These designs often use multiple stages of cluster sampling to obtain survey subjects, and require sample weighting to make the survey data representative of the target population. Our collaborations with biomedical researchers have led us to develop statistical methods for using national health survey data in addressing issues in cancer etiology and surveillance.

Survey Methods Research

We developed methods for efficiently testing regression parameters for data from surveys with highly inefficient sample designs. These designs can have widely variable sample weights resulting in much larger standard errors than one would obtain from a simple random sample of the same number of sampled subjects. In addition, many national surveys have limited degrees of freedom for estimating standard errors, because of small numbers of first stage sampled clusters. Our methods involve augmenting the regression model with independent variables that determine the sample weights. This approach models the effect of the sample weighting without explicitly weighting the regression analysis. To address the limited degrees of freedom, we base the variance estimation on the more numerous clusters at higher stages of sampling, which results in more degrees of freedom.

We also conducted research into other statistical methods for analyzing survey data, including:

(1) graphing survey data with local linear kernel density smoothing adapted for weighted data and developing jackknife methods for estimating the pointwise standard errors for mean smoothed curves, (2) generalized direct standardized estimation for linear and nonlinear regression models in which adjusted treatment effects are standardized to a distribution of the covariates and estimated design-based standard errors, (3) Wald tests for goodness-of-fit for logistic regression models that use the F-distribution and a Monte Carlo simulated distribution, and (4) estimating variances for superpopulation parameters.

Dr. Graubard and Dr. Edward Korn of NCI's Division of Cancer Treatment and Diagnosis, have written a graduate-level textbook entitled "Analysis of Health Surveys" which provides a compilation of practical statistical techniques for use in analyzing health survey data.

Biostatistical Methodology

Correlated observations from cluster samples occur in meta-analyses where each study or experiment is a cluster, and in nonrandomized community studies where the community is the cluster. We developed statistical methods to address this correlation in meta-analyses and community studies. For a meta-analysis of animal experiments that tested for the effect of dietary fat and total caloric intake on mammary tumorigenesis, we developed sandwich estimates of variance for conditional logistic regression which were robust to model misspecification. We are developing statistical methods for analyzing changes in the prevalence of smoking between states (where the state is the cluster) that did or did not receive resources to promote smoking cessation in the nonrandomized American Stop Smoking Intervention Study (ASSIST). These methods include variance estimation for nonparametric smoothing of tobacco sales data that use the bootstrap techniques and regression methods involving random effects models with time dependent covariates to estimate the effectiveness of ASSIST in reducing tobacco consumption and prevalence.

Epidemiologic Collaboration

We collaborate on the design and analyses of a wide range of epidemiologic studies. We are working with NCI investigators on these issues in a study to evaluate the accuracy of reporting of cancer in first and second degree relatives. The sensitivity and specificity of the reporting will be estimated using the Connecticut cancer registry, records from the Health Care Financing Administration, and personal medical records to validate reports about family members from a population random sample of individuals living in Connecticut. Analyses from the NHANES I Epidemiologic Followup Study cohort are being conducted to examine associations between physical activity and the incidence of breast cancer, the intake of aspirin and total mortality, and cancer mortality and cardiovascular mortality. Analyses of data from participants in the Breast Cancer Detection and Demonstration Project found that women in the upper 25% of diet quality had about a 30% reduction in mortality.