Questions tagged [samtools]

Samtools is a suite of programs for interacting with high-throughput sequencing data.

Samtools is a suite of programs for interacting with high-throughput sequencing data. It consists of three separate repositories:

  1. Samtools Reading/writing/editing/indexing/viewing SAM/BAM/CRAM format
  2. Reading/writing BCF2/VCF/gVCF files and calling/filtering/summarising SNP and short indel sequence variants
  3. HTSlib A C library for reading/writing high-throughput sequencing data Samtools and BCFtools both use HTSlib internally, but these source packages contain their own copies of htslib so they can be built independently.

Links:

115 questions
15
votes
3 answers

How to cache reads?

I am using python/pysam to do analyze sequencing data. In its tutorial (pysam - An interface for reading and writing SAM files) for the command mate it says: 'This method is too slow for high-throughput processing. If a read needs to be processed…
user2725109
  • 2,286
  • 4
  • 27
  • 46
10
votes
3 answers

Make can't find curses.h

I have this program called samtools (version 1.3) that is used for manipulating the files that you get from DNA sequencing experiments. The downloaded program is contained in a folder. To set the program up I enter that folder in the terminal (on an…
Gaussia
  • 113
  • 1
  • 2
  • 8
6
votes
1 answer

In bioinformatics, what is a singleton?

I've quickly realized that bioinformatics is not a subject which has its terms clearly defined and easily accessible. I have an apparent discrepancy with some of my results. I used samtools view -b -h -f 8 fileName.bam > mateUnmapped.bam on several…
Mateo
  • 63
  • 1
  • 3
6
votes
4 answers

How to join the stdout of two subprocesses and pipe to stdin of new subprocess in python

Lets say I had the following command running from the shell { samtools view -HS header.sam; # command1 samtools view input.bam 1:1-50000000; # command2 } | samtools view -bS - > output.bam # command3 For those of you who aren't…
5
votes
1 answer

samtools - dyld: Library not loaded: @rpath/libcrypto.1.0.0.dylib

I have seen a few other questions like this elsewhere but I can't seem to find a resolution that works for me. I am trying to run samtools using python on anaconda. I am running macosx catalina. Here is the error code dyld: Library not loaded:…
user2416002
  • 99
  • 1
  • 7
4
votes
2 answers

How to activate conda environment in GitHub Actions?

I am setting up continuous integration using GitHub Actions. One of the prerequisites (samtools) is most easily installed by conda. The standard way to use installed packages is to activate the corresponding conda environment. I am looking for a way…
Timur Shtatland
  • 12,024
  • 2
  • 30
  • 47
4
votes
1 answer

How to build a simple main.cpp file using samtools C API

I am trying to compile (on Linux, using G++) a simple main.cpp program using samtools C API (https://github.com/samtools/samtools) that I have downloaded in the folder of my main.cpp file. I would like to have a very simple makefile compiling the…
P G
  • 43
  • 3
4
votes
0 answers

Oscillating processing speed in a python script using pysam.TabixFile to annotate reads

The initial question I'm writing a bioinformatics script in python (3.5) that parses a large (sorted and indexed) bam file representing sequencing reads aligned on a genome, associates genomic information ("annotations") to these reads, and counts…
bli
  • 7,549
  • 7
  • 48
  • 94
4
votes
2 answers

Infer the length of a sequence using the CIGAR

To give you a bit of context: I am trying to convert a sam file to bam samtools view -bT reference.fasta sequences.sam > sequences.bam which exits with the following error [E::sam_parse1] CIGAR and query sequence are of different…
j91
  • 451
  • 1
  • 4
  • 14
3
votes
0 answers

No reads mapped in proper pairs, in paired-end sequencing bamfile using samtools

I am working with a bamfile of paired-end whole genome sequencing, and want to filter out reads from a specific genomic region that are not mapped in a proper pair (these sometimes indicate a structural variant). I am using samtools, and tried to…
Lisa
  • 31
  • 1
3
votes
2 answers

Why does popen2() hang between write and read calls?

I am trying to integrate use of samtools into a C program. This application reads data in a binary format called BAM, e.g. from stdin: $ cat foo.bam | samtools view -h - ... (I realize this is a useless use of cat, but I'm just showing how a BAM…
Alex Reynolds
  • 95,983
  • 54
  • 240
  • 345
2
votes
2 answers

Split a SAM file in Awk keeping N number of lines as header

I have a very big Sequence Alignment Map (SAM) file as depicted below @X YYYYYY ZZZZZ\ @X ssssss ddddd\ @X CCCCCC LLLLL > FFFFFF 117 ch1 16448 0 * = 16448 0 …
Somu
  • 25
  • 3
2
votes
2 answers

Extracting information from a file using another file with list of specific text; Ubuntu/ Linux

I have one file which has list of IDs, index/header, let's call it…
Caroline
  • 119
  • 8
2
votes
1 answer

samtools calmd is pretty slow

I am using "samtools calmd" to add MD tag back to BAM file. The size of original BAM is around 50Gb (whole genome sequence by using pacbio HIFI reads). The issue that I encountered is that the speed of "calmd" is incredibly slow! The jobs have…
Black No13
  • 41
  • 2
2
votes
1 answer

Cannot sort VCF with bcftools due to invalid input

I am trying to compress & index a VCF file and am facing several issues. When I use bgzip/tabix, it throws an error saying it cannot be indexed due to some unsorted values. # code used to bgzip and tabix bgzip -c fn.vcf > fn.vcf.gz tabix -p vcf…
srd
  • 21
  • 2
1
2 3 4 5 6 7 8