2020 Winners of the DCEG Informatics Tool Challenge
, by DCEG Staff
Six projects were funded through the 2020 DCEG Informatics Tool Challenge. Since its establishment in 2014, the competitive program has provided support for innovative approaches to epidemiological methods, data collection, analysis, and other research efforts using modern technology and informatics.
LDexpress
Shu Hong Lin and Mitchell Machiela
LDexpress is a new module which will be integrated into the LDlink webtool and will provide a user-friendly interface for querying gene expression associated with user-provided SNPs and other SNPs in linkage disequilibrium. The module will enable researchers to identify multiple expression quantitative trait loci (QTL) including expression QTL (eQTL), splicing QTL (sQTL), cell type interaction expression QTL (ieQTL), and cell type interaction splicing QTL (isQTL) more efficiently as well as better understand underlying genetic structure of gene expression. Such information will facilitate hypothesis generation for genetic loci associated in genome-wide association studies of diseases and traits.
COMETS Explorer
Steven Moore, Erikka Loftfield, Krista Zanetti, Ewy Mathé, Ella Temprosa and Mei Liu
The Consortium of Metabolomics Studies (COMETS) was founded by DCEG and Division of Cancer Control and Population Sciences investigators to facilitate the conduct of high-impact, well-powered metabolomics studies in diverse populations. COMETS Explorer is a new template that will allow any public user to conduct sophisticated queries of aggregate COMETS cohort data.
sparrpowR
Derek Brown and Ian Buller, Timothy Myers, Rena Jones and Mitchell Machiela
sparrpowR is a flexible R package and webtool to estimate statistical power of environmental epidemiologic studies to detect spatial clustering of cancer cases in a geographic area of interest. The tool will enable the development of more accurate, cost-effective spatial study designs by allowing investigators to assess study power both in the study design phase and after study completion.
PurityNGS
Tongwu Zhang, Jian Sang, David Wedge and Maria Teresa Landi
Tumor tissues usually consist of a mixture of tumor clones, normal epithelium and stromal cells, which can produce low tumor purity and affect the detection of tumor alterations. Thus, it is important to correctly assess tumor purity before conducting any downstream analyses. PurityNGS is a software that will visualize and estimate tumor purity, ploidy, and clonal architecture by integrating somatic copy number alteration, single nucleotide variants and cancer cell fraction.
FORGE2-TF
Charles Breeze, Sonja Berndt, Sue Pan and Mei Liu
FORGE2-TF is a comprehensive resource and web tool to link genome wide-association study (GWAS) and transcription factor data. By cataloging over 223 million transcription factor motif matches in the human genome, in context with epigenomic mapping data, it will provide researchers insight into GWAS variants with potential regulatory action. This tool can save researchers substantial time by providing both locus-specific and parallel search functionalities to be able to prioritize a subset for experimental analysis.
Developing machine learning methods for cross-walking occupations from depreciated coding systems into current coding systems.
Daniel Russ, Jonas Almeida and Melissa Friesen
A crucial first step in assessing occupational risk factors is categorizing jobs held by study participants into standardized occupational classification groups—a task performed by the Standardized Occupation Coding for Computer-assisted Epidemiologic Research (SOCcer) software. However, multiple classification systems exist across time and geographical region, and variation in coding systems hampers our ability to pool occupational groupings across studies without re-coding those jobs to a common system. This tool will apply machine learning methods to facilitate codes from one system to another, which will improve the performance of SOCcer as well as increase the efficiency and feasibility of incorporating occupational risk factors in large-scale, international epidemiological studies.
Proposals are evaluated for their novel approach to specific research needs, the ability for the project to be completed within one year of initiation, and cost, not to exceed $20,000. Reviewers consider technical feasibility, utility to epidemiologic and genetic research and alignment with the Division’s mission.