I am trying to utilise the vcftools package to calculate weir and cockerham's fst. I would like to loop over two pairs of populations in the first instance and then loop these populations across all variants from the 1000 Genomes project: each chromosome contains a separate vcf file. For example, for pop1 vs pop2, for pop3 vs pop4 calculate fst for chromosomes 1-10. Each population file, for example, LWKfile contains a list of individuals that belong to this population.
I have attempted:
for population in LWK_GBR YRI_FIN; do
firstpop=$(echo $population | cut -d '_' -f1)
secondpop=$(echo $population | cut -d '_' -f2)
for filename in *.vcf.gz; do
vcftools --gzvcf ${filename} \
--weir-fst-pop /outdir/${firstpop}file \
--weir-fst-pop /outdir/${secondpop}file \
--out /out/${population}_${filename}
done
done
However this does not loop through all the files and seems to get stuck on chromosome 10. Is there a more efficient way to perform this in bash as I am concerned the loop within loop will be too slow.