3

I am working with a bamfile of paired-end whole genome sequencing, and want to filter out reads from a specific genomic region that are not mapped in a proper pair (these sometimes indicate a structural variant). I am using samtools, and tried to filter the reads with the "flag" option, to select for reads that are not mapped in a proper pair. If I'm correct, these reads should not have 2 in their flag value. (https://broadinstitute.github.io/picard/explain-flags.html)

However, according to samtools none of my reads are mapped in a proper pair. When I count (-c) all the reads in the region I specified, without filter, it gives me a total count of 179:

samtools view input.bam "8:113483114-113483213"  -c

179

When I filter for reads that are properly paired, i.e. flag contains "2" (-f 2), the count is zero:

samtools view input.bam "8:113483114-113483213"  -f 2 -c

0

I checked to see if the reads are recognized to be paired (-f 1), and if the mates are mapped (-F 8), and all of them are:

samtools view input.bam "8:113483114-113483213"  -f 1 -F 8  -c
179

I also tried for other regions in the genome and encounter the same problem everywhere. I checked the regions in IGV using the same BAM files, and IGV tells me most of the reads are properly paired, and only a few aren't. Does anyone know what is going on here? Were the BAM files flagged in a way that doesn't account for proper mate mapping?

Any help is welcome! Thanks a lot.

Lisa
  • 31
  • 1
  • 3
    Found the answer: turned out the aligner (BWA) was set in such a way that it never adds a properly paired flag. It can be fixed using ppflag-fixer: https://github.com/mskcc/htstools – Lisa Dec 27 '17 at 18:26

0 Answers0