3.15. JSV1
- Dataset category
Genome Variation
- Summary
Allele and genotype frequency data of structural variations obtained from long-read whole genome sequencing of 222 Japanese individuals
- References
Otsuki et al. [3]
- Samples
Activated T-lymphocytes
- Number of samples
333 samples (Male: 161, Female: 172) (111 trios)
- Analysis platorm
PromethION (Flowcell R9.4.1 (cat# FLO-PRO002))
- Sample preparation
Genomic DNA was extracted from activated T cells using a Gentra Puregene Blood Kit (Qiagen) and sheared using a 29-gauge needle to obtain DNA fragments of the appropriate size. Two micrograms of the DNA fragments were subjected to library preparation (Ligation Sequencing Kit; cat# SQK-LSK109).
- Bioinformatics pipeline
Base calling: Guppy 4.2.2 (hac mode)
Quality filtering: the reads with mean quality scores > 6 were used after cropping their head and tail 100 bp
Read alignment: LRA (version 2.17-r941) with the option -ONT
- SV call:
CuteSV (version 1.0.9) with the -min_sv_length 50 option
The individual calls were merged using SURVIVOR (version 1.0.6) with the option 1000 1 1 -1 -1 -1
The joint call was conducted using CuteSV software
This pipeline is basically based on an official pipeline provided by Oxford Nanopore Technologies. https://github.com/nanoporetech/pipeline-structural-variation/releases/tag/v2.0.2
- Allele frequency (AF) estimation and Mendelian inheritance error (MIE) analysis
We first constructed SV dataset (repository) based on the SVs identified from 333 participants. We then extracted SVs observed in 222 unrelated individuals (i.e., fathers and mothers) from the repository and evaluated AF. This step is required to avoid double counting the SVs shared between parents and offspring and, thus, prevent the overestimation of the allele frequencies of the SVs. We also evaluated the MIE rates, taking advantages of trio-based analysis.
- Related pages on jMorp website