Skip to Content

BB Seminar: Statistical challenges for the analysis of human longevity data in families

Statistical challenges for the analysis of human longevity data in families

Biostatistics Branch Seminar Series


Statistical challenges for the analysis of human longevity data in families


Jeanine Houwing Duistermaat, PhD
Departments of Medical Statistics and Bioinformatics
Leiden University Medical Center
The Netherlands


Although there is evidence from several studies that longevity aggregates within families, identification of genetic factors has not been successful. Reasons for lack of progress might be the heterogeneity across epidemiological studies which are combined in a simplistic way in meta genome wide association studies and the small sample sizes of family studies. In addition the effect of covariates will probably change over time, i.e. genetic factors only play a role in a specific age interval. We will consider survival models for the analysis of longevity in family studies while external information on population survival will be used in order to gain statistical efficiency. Challenges are to model the ascertainment of the families, to take into account correlation between family members, to deal with delayed entry and to model age dependent effects. This work is motivated by two studies comprising families with at least two nonagenarian siblings, namely the Leiden Longevity Study (LLS) (420 families) and the European Genetics of Healthy Aging Consortium (GEHA) (2000 families from 15 centers). Genome wide SNP arrays are available in both studies. For the LLS ten years of follow up is available;  13% is still alive, maximum observed age is 107 years of age. For the GEHA we do not have this information yet..  For genetic linkage analysis of the ages of the siblings at entry of the study, we propose a robust Haseman-Elston method (score statistic) based on the martingale residuals. The martingale residuals are computed from population, sex and birth cohort specific life tables. The obtained statistic assigns more weight to older sibling pairs and takes into account heterogeneity across in GEHA. In both studies, regions showing genome wide significance were identified. For the analysis of follow up data in the LLS, we propose a Cox model with a gamma frailty. The frailty distribution appears to depend on the selection of the sibling pairs which makes maximalization of the log likelihood function challenging. A penalized likelihood approach is proposed.