when I run this code:
sorted_chromosomes=($(printf "%s\n" "${chromosomes[@]}" | sort -u))
echo "chromosomes: ${chromosomes[@]}"
echo "sorted_chromosomes: ${sorted_chromosomes[@]}"
for chrom in "${sorted_chromosomes[@]}" ;
do
echo "proxy: Running PLINK LD analysis for chromosome ${chrom}." > /dev/stderr
local out_prefix="${SNAPTMP}/SNAP.${chrom}.proxy"
plink --bfile "${PLINK_REF_PANEL}/${chrom}" \
--r2 --ld-window 1500 --ld-window-r2 ${PLINK_MIN_R2} \
--keep "${PLINK_REF_PANEL_KEEP}" \
--out "${out_prefix}" \
--ld-snp-list "${SNAPTMP}/SNAP.input.proxy" \
> /dev/null #full path to plink wasl added
retVal=$?
if [ $retVal -eq 0 ]; then
perl -pi -e "s/[ \t]+/ /g;" "${out_prefix}.ld"
perl -pi -e "s/^[ \t]+//g;" "${out_prefix}.ld"
perl -pi -e "s/[ \t]+$//g;" "${out_prefix}.ld"
else
echo "Error: PLINK analysis for chromosome ${chrom} failed." > /dev/stderr
fi
done
My code works as expected and I get this output :
chromosomes: 1 6
sorted_chromosomes: 1 6
proxy: Running PLINK LD analysis for chromosome 1.
proxy: Running PLINK LD analysis for chromosome 6.
I am trying to parallelize the LD analysis with plink on each chromosome to make my script more efficient. I am trying to use the parallel command, which my server does indeed have installed.
[-----.-------@hydra1 SNAPPY]$ parallel --version
GNU parallel 20191022
Copyright (C) 2007-2019 Ole Tange and Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
GNU parallel comes with no warranty.
Web site: http://www.gnu.org/software/parallel
When using programs that use GNU Parallel to process data for publication
please cite as described in 'parallel --citation'.
for reference, I have 10 CPU's available:
[-----.-------@hydra1 SNAPPY]$ nproc
10
Here is the code I am using to try and parallelize my for loop:
sorted_chromosomes=($(printf "%s\n" "${chromosomes[@]}" | sort -u))
echo "chromosomes: ${chromosomes[@]}"
echo "sorted_chromosomes: ${sorted_chromosomes[@]}"
execute_plink(){
echo "proxy: Running PLINK LD analysis for chromosome ${chrom}." > /dev/stderr
local chrom="$1"
local out_prefix="${SNAPTMP}/SNAP.${chrom}.proxy"
plink --bfile "${PLINK_REF_PANEL}/${chrom}" \
--r2 --ld-window 1500 --ld-window-r2 ${PLINK_MIN_R2} \
--keep "${PLINK_REF_PANEL_KEEP}" \
--out "${out_prefix}" \
--ld-snp-list "${SNAPTMP}/SNAP.input.proxy" \
> /dev/null #full
retVal=$?
if [ $retVal -eq 0 ]; then
perl -pi -e "s/[ \t]+/ /g;" "${out_prefix}.ld"
perl -pi -e "s/^[ \t]+//g;" "${out_prefix}.ld"
perl -pi -e "s/[ \t]+$//g;" "${out_prefix}.ld"
else
echo "Error: PLINK analysis for chromosome ${chrom} failed." > /dev/stderr
fi
}
export -f execute_plink
# Run PLINK jobs for each chromosome simultaneously using GNU Parallel
parallel execute_plink ::: "${sorted_chromosomes[@]}"
this code now gives me this output:
chromosomes: 1 6
sorted_chromosomes: 1 6
Error: PLINK analysis for chromosome 1 failed.
Error: PLINK analysis for chromosome 6 failed.
I am really not sure what is going wrong here.