Skip to main content
An official website of the United States government

CGEN - R package - Read Me

  • 1. Download and install R packages survival and optmatch if not already installed.
  • 2. Download zip or tar.gz file for your operating system from the download page.
  • 3. Install from R (GUI) package installer or using the install.packages() function, e.g., install.packages(“C:/temp/”, repos=NULL)
  • 4. Load the CGEN package from the GUI or with the command library(CGEN)
  • 5. Type ?CGEN in R console to access help files and package manual


This package is for logistic regression analyses of SNP data in case-control studies. It is designed to give the users flexibility of using a number of different methods for analysis of SNP-environment or SNP-SNP interactions. It is known that power of interaction analysis in case-control studies can be greatly enhanced if it can be assumed that the factors (e.g. two SNPs) under study are independently distributed in the underlying population. The package implements a number of different methods that can incorporate such independence constraints into analysis of interactions in the setting of both unmatched and matched case-control studies. These methods are more general and flexible than the popular case-only method of analysis of interaction that also assumes gene-gene or/and gene-environment independence for the underlying factors in the underlying population. The package also implements various methods, based on shrinkage estimation and conditional-likelihoods, that can automatically adjust for possible violation of the independence assumption that could arise due to direct causal relationship (e.g. between a gene and a behavior exposure) or indirect correlation (e.g due to population stratification). A number of convenient summary and printing functions are included. The package will continue to be updated with new methods as they are developed. The methods are currently not suitable for analysis of SNPs on sex chromosomes.


The main functions for unmatched data are snp.logistic and snp.scan.logistic. Whereas snp.logistic analyzes one SNP with each function call, snp.scan.logistic analyzes a collection of SNPs and writes the summary results to an external file. With snp.logistic, a data frame is input in which the SNP variable must be coded as 0-1-2 (or 0-1). If not, recode.geno can be used for recoding the SNP variable before calling snp.logistic. The functions getSummary, getWaldTest and snp.effects can be called for creating summary tables, computing Wald tests and joint/stratified effects using the returned object from snp.logistic (see Examples in snp.logistic). With snp.scan.logistic, the data is read in from external files defined in snp.list and pheno.list. The collection of p-values computed in snp.scan.logistic, can be plotted using the functions QQ.plot and chromosome.plot.

The function for analysis of matched case-control data is snp.matched. Optimal matching can be obtained from the function getMatchedSets.

This package contains sample genotype data SNPdata, sample covariate data Xdata, and sample SNP meta data LocusMapData. The current version of the packag is only suitable for analysis of SNPs on non-sex chromosomes.


Samsiddhi Bhattacharjee, Nilanjan Chatterjee and William Wheeler


Maximum-likelihood estimation under independence:

Chatterjee, N. and Carroll, R. Semiparametric maximum likelihood estimation exploting gene-environment independence in case-control studies. Biometrika, 2005, 92, 2, pp.399-418.

Shrinkage estimation:

Mukherjee B, Chatterjee N. Exploiting gene-environment independence in analysis of case-control studies: An empirical Bayes approach to trade-off between bias and efficiency. Biometrics 2008, 64(3):685-94.

Mukherjee B et al. Tests for gene-environment interaction from case-control data: a novel study of type I error, power and designs.Genetic Epidemiology, 2008, 32:615-26.

Chen YH, Chatterjee N, Carroll R. Shrinkage estimators for robust and efficient inference in haplotype-based case-control studies.Journal of the American Statistical Association, 2009, 104: 220-233.

Conditional Logistic Regression and Adjustment for Population stratification:

Chatterjee N, Zeynep K and Carroll R. Exploiting gene-environmentindependence in family-based case-control studies: Increased power for detecting associations, interactions and joint-effects.Genetic Epidemiology 2005; 28:138-156.

Bhattacharjee S, Wang Z, Ciampa J, Kraft P, Chanock S, Yu K, Chatterjee N. Using Principal Components of Genetic Variation for Robust and Powerful Detection of Gene-Gene Interactions in Case-Control and Case-Only studies.American Journal of Human Genetics 2010, 86(3):331-342.