segCNV ReadMe
*******************************************************************************************
** SegCNV
**
**
** 2011-1-18
*******************************************************************************************
### Introduction
SegCNV is a software package, implemented in C++, to detect germline copy number variations in SNP array data.
Currently, SegCNV supports Illumina 550K and 610K genotyping platforms. We are working to include more platforms.
## Installation
If the precompiled binary program segcnv works in in your platform, then
1. Create a work folder WORK, copy the binary file segcnv to WORK;
2. Copy folder configuration (and all files in the folder) to WORK.
If the precompiled binary program segcnv does not work in your platform, please run
"g++ Single.Seg.CNV.cpp -o segcnv" to generate executable segcnv file and then copy
it to the folder WORK.
## Usage
Preprocessing of the intensity file:
We strongly encourage you to remove the wave effect correlated with the GC content first.
Data format:
Each intensity file (for one subject) has three columns: probe name, BAF and LRR.
We use one example to illustrate the usage of segcnv. We call CNVs of ten Hapmap samples genotyped using 550K platform.
To run the program, type "./segcnv --callcnv --filelist filelist --outfile outfile".
Here, filelist specifies the intensity file names and its subject ID. In this example, filelist file is:
./Hapmap/NA06991.GCA NA06991
./Hapmap/NA06993.GCA NA06993
./Hapmap/NA06994.GCA NA06994
./Hapmap/NA07029.GCA NA07029
./Hapmap/NA07345.GCA NA07345
./Hapmap/NA07348.GCA NA07348
./Hapmap/NA07357.GCA NA07357
./Hapmap/NA10830.GCA NA10830
./Hapmap/NA10835.GCA NA10835
./Hapmap/NA10847.GCA NA10847
"outfile" specifies the ourput file.
The program will search for all CNVs covered by at least 3 probes. You can run analysis in other specifications. For example:
"./segcnv --callcnv --filelist filelist --platform 1 -chrid 2 -min 5 --outfile outfile" will call CNVs covered by at least 5
probes, only on chromosome 2 for the data produced from the Illumina 610K platform. Here is the full explanation of the usage:
usage: ./seg_cnv <flags (optional)>
flags:
1. Platform
< --platform > Currently support two platforms:
0 Illumina.550K (Default)
1 Illumina.610K
2. CNV Calling
--callcnv Call CNV for each sample.
--filelist Filelist with File_Name, sample_ID.
< -chrid > Call CNV on the chromosome chrid specifies. Default analyzes all 22 chromosomes.
< -min > Min number of probes covered by the CNV. Default is 3.
3. Out File
--outfile Out file.
# Output file
The output file has 14 columns:
* Sample ID
* Chromosome ID
* Segment starting probe order
* Segment ending probe order
* Total probes in the segment
* Segment starting probe bp location
* Segment ending probe bp location
* Start probe name
* End probe name
* Copy number (0: hyterozygous deletion, 1: hemizygous deletion, 2: normal copy, 3: duplication with copy number)
* Score(Z score of the called segments based only on LRR.)
* Score(Z score of the called segments integrating both LRR and BAF for duplication segments.)
* Expected false positive calls per genome scan at this threshold. Smaller is more stringent and results in less calls.
Suggested value: 0.01 or 0.05. IMPORTANTLY, segcnv calls segments even with very modest evidence of CNV (|z| score > 3.5).
At this threshold, most are false positives, but many are true CNVs. You choose a threshold to filter out weak candidate CNVs.
# Example of two rows from an out file.
NA06991 1 622 624 3 4441250 4444038 2789 rs11799990 rs350170 1 3.88739 3.88739 18.2598
NA06991 1 1635 1638 4 8503602 8539425 35824 rs6656249 rs1473420 1 2.91556 2.91556 18.2598