Skip to Content
Discovering the causes of cancer and the means of prevention

segCNV ReadMe

** SegCNV
** 2011-1-18

### Introduction
SegCNV is a software package, implemented in C++, to detect germline copy number variations in SNP array data.
Currently, SegCNV supports Illumina 550K and 610K  genotyping platforms. We are working to include more platforms.

## Installation

   If the precompiled binary program segcnv works in in your platform, then
       1. Create a work folder WORK, copy the binary file segcnv to WORK;
       2. Copy folder configuration (and all files in the folder) to WORK.
   If the precompiled binary program segcnv does not work in your platform, please run  
   "g++ Single.Seg.CNV.cpp -o segcnv" to generate executable segcnv file and then copy
   it to the folder WORK.

##  Usage

   Preprocessing of the intensity file:
   We strongly encourage you to remove the wave effect correlated with the GC content first.

   Data format:
   Each intensity file (for one subject) has three columns: probe name, BAF and LRR.

   We use one example to illustrate the usage of segcnv. We call CNVs of ten Hapmap samples genotyped using 550K platform.
   To run the program, type   "./segcnv --callcnv --filelist filelist --outfile outfile".
   Here, filelist specifies the intensity file names and its subject ID. In this example, filelist file is:
       ./Hapmap/NA06991.GCA   NA06991
       ./Hapmap/NA06993.GCA   NA06993
       ./Hapmap/NA06994.GCA   NA06994
       ./Hapmap/NA07029.GCA   NA07029
       ./Hapmap/NA07345.GCA   NA07345
       ./Hapmap/NA07348.GCA   NA07348
       ./Hapmap/NA07357.GCA   NA07357
       ./Hapmap/NA10830.GCA   NA10830
       ./Hapmap/NA10835.GCA   NA10835
       ./Hapmap/NA10847.GCA   NA10847

   "outfile" specifies the ourput file.              

   The program will search for all CNVs covered by at least 3 probes. You can run analysis in other specifications. For example:
   "./segcnv --callcnv --filelist filelist --platform 1 -chrid 2 -min 5 --outfile outfile" will call CNVs covered by at least 5
   probes, only on chromosome 2 for the data produced from the Illumina 610K platform. Here is the full explanation of the usage:

        usage: ./seg_cnv  <flags (optional)>
            1. Platform
            < --platform >  Currently support two platforms:
                0      Illumina.550K (Default)
                1      Illumina.610K
            2. CNV Calling
            --callcnv       Call CNV for each sample.
            --filelist      Filelist with File_Name, sample_ID.
            < -chrid >      Call CNV on the chromosome chrid specifies. Default analyzes all 22 chromosomes.
            < -min >        Min number of probes covered by the CNV. Default is 3.
            3. Out File
            --outfile       Out file.

# Output file

   The output file has 14 columns:

* Sample ID
* Chromosome ID
* Segment starting probe order
* Segment ending probe order
* Total probes in the segment
* Segment starting probe bp location
* Segment ending probe bp location
* Start probe name
* End probe name
* Copy number (0: hyterozygous deletion, 1: hemizygous deletion, 2: normal copy, 3: duplication with copy number)
* Score(Z score of the called segments based only on LRR.)
* Score(Z score of the called segments integrating both LRR and BAF for duplication segments.)
* Expected false positive calls per genome scan at this threshold. Smaller is more stringent and results in less calls.
         Suggested value: 0.01 or 0.05. IMPORTANTLY, segcnv calls segments even with very modest evidence of CNV (|z| score > 3.5).
         At this threshold, most are false positives, but many are true CNVs. You choose a threshold to filter out weak candidate CNVs.

       # Example of two rows from an out file.
NA06991 1       622     624     3       4441250 4444038 2789    rs11799990      rs350170        1       3.88739 3.88739 18.2598
NA06991 1       1635    1638    4       8503602 8539425 35824   rs6656249       rs1473420       1       2.91556 2.91556 18.2598