This bash script is meant to be part of a pipeline that processes zipped .vcf file that contain genomes from multiple patients (which means the files are huge even when zipped, like 3-5GB).
My problem is that I keep running out of memory when running this script. It is being run in a GCP high mem VM.
I am hoping there is a way to optimize the memory usage so that this doesn't fail. I looked into it but found nothing.
#!/bin/bash
for filename in ./*.vcf.gz; do
[ -e "$filename" ] || continue
name=${filename##*/}
base=${name%.vcf.gz}
bcftools query -l "$filename" >> ${base}_list.txt
for line in `cat ${base}_list.txt`; do
bcftools view -s "$line" "$filename" -o ${line}.vcf.gz
gzip ${line}.vcf
done
done