0

I am using Bio::DB::Sam in a Centos7 environment, using version 0.1.17 of samtools. I am using this procedure to perform my installation:

wget http://sourceforge.net/projects/samtools/files/samtools/0.1.17/samtools-0.1.17.tar.bz2
tar xjf samtools-0.1.17.tar.bz2 && cd samtools-0.1.17
make CFLAGS=-fPIC
export SAMTOOLS=`pwd`
cpanm Bio::DB::Sam

which I discovered here (notice I changed the version of samtools)

The crash occurs intermittently, sometimes on the same input files. My general procedure is as follows:

  1. Use bowtie to generate a .sam file from a .fastq file, using a custom bowtie index
  2. Use samtools to convert my .sam to a .bam, sorting and indexing the file along the way
  3. Issue the following Perl commands:

Perl:

my $sortbam = align_and_sort_and_index($reads_file);   # steps 1 and 2
my @all_gene_ids = qw(gene_id1 gene_id2 gene_id3);   # really lots more
for (my $worker=0; $worker <= $n_threads; $worker++) {
    my $pid = fork;
    die "fork error: $!" unless defined $pid;
    next if $pid;     # parent
    my @gene_ids = get_unique_subset(@all_gene_ids, $worker);
    my $sam = Bio::DB::Sam->new(-bam=>$sortbam, -fasta=>$ampl_seqfile, -autoindex=>0);
    foreach my $gene_id (@gene_ids) {
        # THIS NEXT LINE IS THE ONE THAT SEGFAULTS (SOMETIMES):
        my @alignments = $sam->get_features_by_location(-seq_id => $gene_id);
        # do something interesting with @alignments...
    }
    exit;
}

while ((my $pid=wait()) != -1) {
    print "reaped $pid\n";
}

To date I have tried the following:

  1. Increased the number of allowed open files (ulimit -n)
  2. Increased the number of number of allowed subprocesses
  3. Increased the limit of pipe buffers
  4. Increased the swap space

Any and all suggestions would be greatly appreciated. Thank you!

phonybone
  • 35
  • 1
  • 9
  • If you run this without any of the `fork` code do you see the same issue? If so, it may be some gene IDs or BAM files that are malformed. I would simplify the code first because it may not be module. – SES Jun 14 '17 at 01:49
  • Also, you mention 3 different versions of samtools. Please clarify that this is a typo and one of those is installed/linked correctly. – SES Jun 14 '17 at 02:05
  • Unrelated, but take a look at https://metacpan.org/pod/Parallel::ForkManager. It will make your life easier. – simbabque Jun 14 '17 at 06:06
  • @SES samtools version numbers fixed, thank you. – phonybone Jun 15 '17 at 18:41
  • @SES: I have written the code to not fork, and I do not get the same issue. It seems to work fine. However, the code as I originally inherited it does use the fork, and, oddly enough, it seems to work fine when installed on MacOSX machines. Also, if I manually throttle back the number of forks that are used, it also seems to work, which initially made me suspect a resources issue (open file descriptors, etc). – phonybone Jun 15 '17 at 18:44
  • @SES: the .bam file in question is created once, and used as an input to Bio::DB::Sam->new(), so I don't think it is malformed. Also, I can run the single-fork version of the code against the same .bam file and it seems ok, even after a crash, so I think the file is ok. I'll double check, though. Thx! – phonybone Jun 15 '17 at 18:46
  • @simbabque If I wind up going in and re-writing the code, I'll definitely look into Parallel::ForkManager. Thx! – phonybone Jun 15 '17 at 18:47
  • What's the `exit` statement doing there? – flies Oct 13 '17 at 18:51
  • @flies: The exit statement is there because each iteration of the loop is spawning a new process (via the fork call), then doing it's work, then quitting. – phonybone Oct 16 '17 at 20:44
  • but it's inside the loop over gene ids, so it seems as though the forked processes will all quit after looking at the first gene? (I've not used fork hardly at all, so IDK if I'm missing something.) – flies Oct 18 '17 at 14:18
  • I see what you're saying; there is a missing brace ( '}' ) in the for-loop. You're correct that it would exit after one iteration. It was a typo. – phonybone Oct 19 '17 at 16:32

0 Answers0