BASH script creates excess files (parallel)

Question

I have been modifying this script and it runs, allowing me to produce the desired output. However, I also get unwanted output as well. For example, the script seems to take the ends of files in the current directory (.txt, .bcf, .bam) and combines them into files ending .txt.bam or .bam.bcf. So the problem is that it creates a large number of excess files which are not needed.

#!/bin/bash

set - eu


genome=/storage1/ref_genome/GRCh38.dxv.fa
export genome


function parallel_call {
    bcftools mpileup \
        --fasta-ref ${genome} \
        --regions $2 \
        --output-type u \
        $1 | \
    bcftools call --multiallelic-caller \
              --variants-only \
              --output-type u - > ${1/.bam/}.$2.bcf
}

export -f parallel_call


function bam_chromosomes {
        samtools idxstats $1 | cut -f 1 | grep -w -f search_file.txt
}

export -f bam_chromosomes



chrom_set=`bam_chromosomes test_file.bam`
parallel --verbose -j 90% parallel_call test_file.bam ::: ${chrom_set}

The ideal output would be:

test_file.chr1.bcf
test_file.chr2.bcf
test_file.chr3.bcf
...
test_file.chr22.bcf

I have tried a few things such as specifying the exact patterns I want grep to pick up. But I have no idea why this is happening - unless my use of parallel is incorrect?

I would upload the files however they are very large and my internet is really slow currently. Any ideas would be great.

CRCh38.dxv.fa = human genome reference

search_file.txt = file containing patterns for grep to search. In this case it contains (chr1, chr2, chr3 ... chr22)

test_file.bam = a bam file used for testing

[Shellcheck](https://www.shellcheck.net/) shows several issues with the code. The most concerning issues are the ones relating to missing quotes. Since it appears that strange things are happening with files in the current directory that shouldn't be involved, it seems possible that there's a `*` in one of the variables that is being expanded to a list of files in the current directory due to the missing quotes. — pjh, Mar 01 '22 at 11:40
Fix the problems identified by [Shellcheck](https://www.shellcheck.net/) and try again. Also try running the program with tracing turned on (`set -o xtrace`, `set -x`, or invoke the program with `bash -x prog`). See [How can I debug a Bash script?](https://stackoverflow.com/q/951336/4154375) and [How to debug a bash script?](https://unix.stackexchange.com/q/155551/264812). You could also try turning off globbing (`set -o noglob`, `set -f`) to see if that makes any difference. I wouldn't use it as a substitute for proper quoting though. — pjh, Mar 01 '22 at 11:41
okay thanks for that, I'll fix the issues and see if it runs as desired. — Chip, Mar 01 '22 at 11:56

BASH script creates excess files (parallel)

0 Answers0