I am working with a bamfile of paired-end whole genome sequencing, and want to filter out reads from a specific genomic region that are not mapped in a proper pair (these sometimes indicate a structural variant). I am using samtools, and tried to filter the reads with the "flag" option, to select for reads that are not mapped in a proper pair. If I'm correct, these reads should not have 2 in their flag value. (https://broadinstitute.github.io/picard/explain-flags.html)
However, according to samtools none of my reads are mapped in a proper pair. When I count (-c) all the reads in the region I specified, without filter, it gives me a total count of 179:
samtools view input.bam "8:113483114-113483213" -c
179
When I filter for reads that are properly paired, i.e. flag contains "2" (-f 2), the count is zero:
samtools view input.bam "8:113483114-113483213" -f 2 -c
0
I checked to see if the reads are recognized to be paired (-f 1), and if the mates are mapped (-F 8), and all of them are:
samtools view input.bam "8:113483114-113483213" -f 1 -F 8 -c
179
I also tried for other regions in the genome and encounter the same problem everywhere. I checked the regions in IGV using the same BAM files, and IGV tells me most of the reads are properly paired, and only a few aren't. Does anyone know what is going on here? Were the BAM files flagged in a way that doesn't account for proper mate mapping?
Any help is welcome! Thanks a lot.