- Dataset category
Copy Number Variation
Copy number variation data obtained from short-read whole genome sequencing of 48,874 Japanese individuals
Tadaka et al. 
- Samples analyzed
Blood (buffy coat), saliva
- Analysis method
Perform following procedure to CRAM files that created by 54KJPN-SNV/INDEL
Randomly select 200 samples for each sequencer and sequencing institution combination
- Run the GATK CNV Germline Cohort Workflow for each set of 200 samples selected in step 2.
At this time, the CNV analysis bin is based on the non-N region of each chromosome cut every 1 kbp
Group all samples by 200 samples for each sequencer and sequencing institution combination
Run the GATK CNV Germline Case Workflow for each group created in step 4 using the Panel of Normal created in step 2.
Count the number of amplification regions and loss regions for all autosomes in the sample prepared in step 5.
- Use Inter-Quartile Range (IQR) to remove samples with outliers in amplification and loss counts
1.5 times the IQR added to 75percentile as the upper limit and 1.5 times the IQR subtracted from 25percentile as the lower limit
Remove samples selected in step 7 that are not included in 54KJPN-SNV/INDEL.
Calculate the number of samples per CN per bin