I've quickly realized that bioinformatics is not a subject which has its terms clearly defined and easily accessible. I have an apparent discrepancy with some of my results.
I used samtools view -b -h -f 8 fileName.bam > mateUnmapped.bam
on several BAM files. I am under the impression that this command extracts only reads whose partner does not align to the draft genome (also includes header; the output is in BAM format)
When I use samtools 'flagstat'
on the resulting files, I get an interesting result: the number of 'singletons' do not match the total number of reads... which seems odd to me.
The only reconciliation I can find is here:
http://seqanswers.com/forums/showthread.php?t=46711
One person which replies to the question posed in this forum claims that singletons are sometimes defined as sequences which do not have a partner read at all. However, that still doesn't explain away my result. Flagstat says about 40% of my reads are singletons, but I feel like based on the 'view' command I used, they should ALL be singletons.
Can a seasoned bioinformatician help me out?