As part of the Division of Cancer Epidemiology and Genetics’ mission to discover the causes of cancer and the means of its prevention, DCEG is committed to sharing research data to further advance science, improve public health, and leverage the investment made by the U.S. taxpayers in the Division’s research program.
Data sharing must be conducted in a manner consistent with federal, state, and local laws and policies concerning privacy, confidentiality, protection from discrimination, the honoring of informed consent, as well as other restrictions related to institutional review board (IRB) approvals. For some studies, there may be data that cannot be shared because of conditions placed upon DCEG by the individuals or organizations who supplied the data to DCEG for the original research.
DCEG has the highest commitment to protecting the identity of study participants against undesired intrusions (privacy) and to limiting access to study information that might individually identify them (confidentiality). All direct identifiers are omitted from data sets; however, because study data often include extremely detailed personal information (e.g., age, gender, race, residence, employment histories, medical histories), it may be possible in rare cases through complex analysis and with outside information, to identify specific subjects. Therefore, DCEG will redact data as necessary to protect study subject privacy, particularly for data requested under the Freedom of Information Act (FOIA) mechanism (see Section III.D). Less data redaction may be necessary under Data Transfer Agreements (DTA) that contain provisions assuring that the researchers will not try to learn the identity of subjects or link the shared data with individually identifiable records from other sources (see Section III.C).
While DCEG is committed to data sharing, we believe the optimal time to provide data is after the cleaning and quality control processes have been completed and there is a final analytic data set. The reasons for this are related to NIH policy and data quality. Prior to this time-point, study data sets cannot be considered “final.” Throughout data collection and analysis, various actions are undertaken to detect out-of-range, illogical, and inconsistent data, to check original sources, and to correct data as needed. Because this process can continue up to the time a paper is accepted for publication, sharing data prior to that time is not advisable.
In response to outside requests, DCEG will provide the necessary documentation (e.g., questionnaires, coding manual, data dictionaries) to understand and analyze study data. Further assistance from DCEG investigators, however, may not be possible due to time constraints and the competing demands of their positions. Researchers requesting data through mechanisms other than a mutually–agreed upon scientific collaboration with a DCEG investigator should not expect assistance beyond the provision of standard study documentation.
Due to fiscal constraints, DCEG reserves the right to require payments to cover the cost of producing data sets and accompanying documentation.
There are various methods to access data, depending on whether there are established study-specific application and review procedures, whether the requestor prefers to collaborate with DCEG investigators or to work independently, and whether special precautions are needed to protect the privacy of study subjects or the confidentiality of study data. Depending on the data elements of interest, there may be special considerations affecting requests from investigators outside the U.S.
Epidemiologic data: Several of DCEG’s large prospective cohort studies have established procedures whereby researchers can request data, and some have publicly available data posted on their study websites. Some have regularly scheduled calls for proposals, while others accept proposals or more general requests for data on an ongoing basis. Typically, researchers must submit proposals detailing their research plans. Information on the following studies can be found at these websites:
Other DCEG studies have published findings in journals that require access to data that underlie the results described in their manuscripts. Most DCEG studies involve human subjects, and the protection of study participants and their private information is required by law. The DCEG Data Repository Committee accepts requests for these published data sets, and reviews the data sets prior to release to ensure that confidentiality of individuals participating in DCEG studies is maintained.
Genomic Data: In accordance with the NIH Genomic Data Sharing (GDS) Policy, qualified genomic data generated by DCEG will generally be made available through an NIH-designated repository such as the National Center for Biotechnology Information’s Database of Genotypes and Phenotypes (dbGaP) and/or the Sequence Read Archive (SRA), and the NCI’s Genomic Data Commons (GDC). Researchers may obtain controlled-access data only with the permission of the NCI Data Access Committee (DAC), which operates according to the policies established by the NIH. For controlled-access data, the requestor must stipulate in a Data Use Certification that they will comply with federal, state, and local policy; will only use the data for the specified research use; will not identify study participants; will not transfer the data; and other provisions. An institute official must vouch that the requestor is a bona fide researcher, and must also stipulate to the provisions related to privacy and use of the data. Additionally, an IT Director at the requesting institution must be identified to vouch that the data are being stored and maintained in a secure fashion. The federal Genetic Information Nondiscrimination Act, which makes it illegal for health insurance companies, group health plans, and most employers to discriminate based on genetic information, is also relevant to these requests.
Many researchers interested in accessing data from DCEG studies prefer to do so by establishing a scientific collaboration with DCEG investigators. Through these formal collaborations, the DCEG team members can share their expertise and knowledge of the intricacies of the study data, which greatly enhances the research endeavor. A collaborative agreement is developed which details the data to be shared, the roles and responsibilities of the parties involved, and consistency of the proposed research with the study informed consent and IRB approvals.
Researchers should contact the appropriate DCEG investigator directly to determine if a mutually agreeable scientific collaboration can be established.
DCEG data for collaborative and non-collaborative research will be released under the terms of a Data Transfer Agreement, which must be signed by the DCEG Director, the recipient scientist, and an official representing the recipient scientist’s institution. The agreement will specify the proposed research plan, identify who will have access to the data, and contain provisions related to protecting privacy. These provisions will require that: 1) the requesting researcher will neither attempt to link nor permit others to link the data set with individually-identifiable records from any other data set, and 2) the researcher will not attempt to use the data set or permit others to use it to learn the identify of any person. In addition, the relevant NIH IRB must provide an assurance that the proposed research and data transfer are consistent with informed consent. If the agreement is for non-collaborative research, DCEG investigators will not be able to assist the requestor beyond provision of standard study documentation.
Researchers should contact the appropriate DCEG investigator directly to request data and develop a Data Transfer Agreement.
FOIA requires federal agencies to provide access to agency records except to the extent that such records are protected from public disclosure by one of nine exemptions or three law enforcement exclusions. The exemptions that most often apply to DCEG records are those that prohibit disclosure of records that are: 1) specifically exempted by another federal law (e.g., the Privacy Act of 1974) (Exemption 3); 2) privileged inter-agency or intra-agency communications that are pre-decisional and part of an agency’s deliberative process (Exemption 5); and 3) personnel, medical, or similar files, release of which would constitute a clearly unwarranted invasion of personal privacy (Exemption 6). Requestors should note that because there are no restrictions placed on use or further dissemination of records obtained through a FOIA request, DCEG takes special care to protect subjects from invasion of privacy that might result from linking the data set to individually identifiable records from any other data set. FOIA allows the agency to recover part of the cost of responding to a request. DCEG investigators will not be able to assist the requestor to use or interpret the data beyond provision of standard study documentation. Details on FOIA are available at the Freedom of Information Act Office.
FOIA requests for DCEG records should be submitted to the National Cancer Institute FOIA Coordinator, Building 31, Room 10A48, 9000 Rockville Pike, Bethesda, MD 20892.