Skip to main content
An official website of the United States government

Utilizing Longitudinal Primary Care Patient Data to Conduct Epidemiologic Studies

, by Victoria A. Fisher, M.P.H.

Shahinaz Gadalla with Arlene Gallagher and Wilhelmine Meeraus (epidemiologists with the UK Clinical Practice Research Datalink)

DCEG investigators are coordinating with the Clinical Practice Research Datalink (CPRD) to access observational data for millions of patients in clinics across the United Kingdom (UK). The CPRD is the world’s largest database of anonymized, longitudinal primary care medical records; it collects data from over 850 general practices, with historical data on over 22 million patients. Early validation studies showed that the clinics included in the database constituted a broadly representative sample of general practices across the UK.

“The CPRD is a rich resource for cancer epidemiology research,” said Shahinaz Gadalla, M.D., Ph.D., investigator in the Clinical Genetics Branch (CGB), who is leading DCEG efforts to access and utilize the database. As a governmental, not-for-profit research service, the CPRD is jointly funded by the UK’s National Health Service (NHS) National Institute for Health Research and the Medicines and Healthcare Products Regulatory Agency. The database contains patient demographics and lifestyle factors (e.g., smoking and alcohol consumption), clinical diagnoses, test results, immunization and cancer screening records, and full prescription data.

The CPRD primary care records are linkable to other resources such as cancer and death registries, and hospital admissions data. Essentially, the CPRD mimics a prospective cohort study, without the need for recruitment and retention of participants, and without the typically long interval between recruitment and ascertainment of cancers. This can be said about other electronic medical databases, but the CPRD is unique in that the UK primary care physicians are the gatekeepers of patient medical care.

“Through primary care records, we can see the full spectrum of a disease or exposure,” Dr. Gadalla said. “In my work, using CPRD allowed me to capture the full age range of patients with myotonic dystrophy, rather than just the more severe subset we normally identify when using hospital records.”

Another advantage to the CPRD – there is a new release of data every month. “For example, data extracted in 2016 is current up to the time of the download,” she said. “It’s very dynamic and detailed, compared to other resources.”

Tapping into a valuable resource

For the past three years, Dr. Gadalla has worked with the CPRD research group, DCEG investigators, the NCI Office of Acquisitions, and the NCI Technology Transfer Center to ensure DCEG investigators have access to the data. “The first year was devoted to building the infrastructure, understanding the data, and putting the contractual agreement in place,” said Dr. Gadalla. “Now that we’re a few years out, I think the process for investigators is smooth.”

"The CPRD is a rich resource for cancer epidemiology research."

Dr. Gadalla leads the DCEG CPRD users’ committee, which meets regularly to discuss new proposals, address questions, and develop a repository of Read codes (the UK NHS standard scheme for describing the care and treatment of patients). Read codes allow for more specifications and detail than International Classification of Diseases (ICD) codes, says Dr. Gadalla. “Coding can get complex and interesting, so the code repository is a valuable resource.” In addition, several DCEG CPRD data users and committee members devoted time to develop common algorithms to extract covariates like smoking, BMI, and alcohol use; these algorithms are currently used for all projects.

A number of studies are underway utilizing the CPRD data. In CGB, Dr. Gadalla is investigating benign and malignant tumors among patients with myotonic dystrophy; itraconazole (a drug used to treat fungal infections) and risk of cancer; and nitrates 5-alpha-reductase and hormone replacement therapy in relation to risk of upper gastrointestinal cancer (in collaboration with Maria Constanza Camargo, Ph.D., Metabolic Epidemiology Branch (MEB), and Dr. Christopher Cardwell, Queen’s University Belfast). Nicolas Wentzensen, M.D., Ph.D., Deputy Chief of CGB, is exploring prescription medications for cardiovascular disease and metabolic syndrome in relation to endometrial/ovarian cancer.

In the Infections and Immunoepidemiology Branch, Sam Mbulaiteye, MBChB, M.Phil., M.Med., is studying risk factors of Burkitt lymphoma, and Jill Koshiol, Ph.D., is investigating statin use and risk of biliary tract cancer.

Michael B. Cook, Ph.D., MEB, is exploring reproductive risk factors for prostate cancer incidence, mortality, and survival; and hormone therapy in relation to risks of second cancers and mortality among women and men. Katherine A. McGlynn, Ph.D., M.P.H., senior investigator in MEB, is investigating liver cancer and domperidone use; and oophorectomy and risk of primary liver cancer and fatty liver disease.

From the Biostatistics Branch, postdoctoral fellow Ana Best, Ph.D., used the database to test a statistical method of combining prevalent and incident cohort data in estimating survivor function.

New avenues and next steps

"I’ve learned a lot from the administrative perspective of leading this large effort, and I’m pleased that this work has contributed to a number of new collaborations within DCEG and beyond."

The sheer wealth of data in the CPRD allows investigators to venture into new areas of study. “I’ve started some work in pharmacoepidemiology,” Dr. Gadalla said. “This field is rich and very important for cancer research. There is a lot to learn.”

Dr. Gadalla will host a seminar later this year to highlight a few DCEG studies that are using these data. In the meantime, investigators continue to move forward on several publications and to build collaborations to fully utilize this resource.

“It’s been an amazing experience,” Dr. Gadalla said. “I’ve learned a lot from the administrative perspective of leading this large effort, and I’m pleased that this work has contributed to a number of new collaborations within DCEG and beyond. I’m trying to encourage more fellows to use it, as it is such a nice resource for them. It’s exciting to see what is on the horizon.”

< Older Post

New Regions of the Human Genome Linked to Skin Color Variation in Some African Populations

Newer Post >

International consortium adds 72 genetic variants to list of known breast cancer associations