Skip to main content
An official website of the United States government

PCAmatchR: Match Cases to Controls Based on Genotype Principal Components

PCAmatchR is an open source, R-based software package that enables users to perform optimal case-control matching for more accurate genome-wide association study (GWAS) analyses. By performing analyses of user-supplied principal components, PCAmatchR aids in selecting controls that are well matched by ancestry to cases, thus avoiding biased association results caused by ancestry-based genetic differences between cases and controls.

PCAmatchR takes user-supplied PCA outputs and selects matching controls for cases by utilizing a weighted Mahalanobis distance metric, which weights each principal component by the percent of genetic variation explained.  

Software Download

Download PCAmatchR software and supporting documentation from CRAN


Brown DW, Myers TA, Machiela MJ. PCAmatchR: A flexible R package for optimal case-control matching using weighted principal componentsBioinformatics 2021 May 23.


Questions? Contact Derek Brown or Mitchell Machiela.


If you would like to reproduce some or all of this content, see Reuse of NCI Information for guidance about copyright and permissions. In the case of permitted digital reproduction, please credit the National Cancer Institute as the source and link to the original NCI product using the original product's title; e.g., “PCAmatchR: Match Cases to Controls Based on Genotype Principal Components was originally published by the National Cancer Institute.”