3.12. JSV1

Dataset category

Genome Variation

Summary

Allele and genotype frequency data of structural variations obtained from long-read whole genome sequencing of 222 Japanese individuals

References

Otsuki et al. [3]

Samples

Activated T-lymphocytes

Number of samples

333 samples (Male: 161, Female: 172) (111 trios)

Analysis platorm

PromethION (Flowcell R9.4.1 (cat# FLO-PRO002))

Sample preparation

Genomic DNA was extracted from activated T cells using a Gentra Puregene Blood Kit (Qiagen) and sheared using a 29-gauge needle to obtain DNA fragments of the appropriate size. Two micrograms of the DNA fragments were subjected to library preparation (Ligation Sequencing Kit; cat# SQK-LSK109).

Bioinformatics pipeline

Base calling: Guppy 4.2.2 (hac mode)
Quality filtering: the reads with mean quality scores > 6 were used after cropping their head and tail 100 bp
Read alignment: LRA (version 2.17-r941) with the option -ONT
SV call:
- CuteSV (version 1.0.9) with the -min_sv_length 50 option
- The individual calls were merged using SURVIVOR (version 1.0.6) with the option 1000 1 1 -1 -1 -1
- The joint call was conducted using CuteSV software

This pipeline is basically based on an official pipeline provided by Oxford Nanopore Technologies. https://github.com/nanoporetech/pipeline-structural-variation/releases/tag/v2.0.2

Allele frequency (AF) estimation and Mendelian inheritance error (MIE) analysis

We first constructed SV dataset (repository) based on the SVs identified from 333 participants. We then extracted SVs observed in 222 unrelated individuals (i.e., fathers and mothers) from the repository and evaluated AF. This step is required to avoid double counting the SVs shared between parents and offspring and, thus, prevent the overestimation of the allele frequencies of the SVs. We also evaluated the MIE rates, taking advantages of trio-based analysis.

Related pages on jMorp website