0

Apologies for the basic question, but I have not been able to find a solution on Google. I want to run a script separately for each chromosome in a list (called CHROMS in the code below) using parallelization via MPI. The script I am calling (some_script.sh) takes a chromosome parameter which I want to change for each call and another parameter which I want to keep constant across all calls. Basically, I want what the code below does, but with mpiexec instead of background processes.

#Run the pipeline for each chromosome separately.
#run_chromosome_iteration.sh
SOME_OTHER_PARAM="blah blah"
for c in $CHROMS;
    do 
        $SCRIPTS/some_script.sh $c $SOME_OTHER_PARAM &
    done

Edit: I actually have two levels of parallelization going on in my project: I am running my model 100 times using 100 jobs, and I would like to parallelize each job across a set of chromosomes (11 in this case). Please see the code below:

#Submit all jobs.
for i in {1..100};
    do 
        qsub -v ITER=${i} run_chromosome_iteration.sh
    done 

So I could use job arrays at the lower level of parallelization, but it would result in 1100 independent jobs and would be less memory efficient than sharing the memory between parallel processes. I also cannot simply use background processes because the amount of memory I will use requires two compute nodes in my cluster. This is why I want to use MPI.

Tara Eicher
  • 138
  • 1
  • 8
  • 1
    it seems you want to use `mpiexec` in order to launch an embarrassingly parallel simulation that is based on a non MPI app. This is not how you should work. I'd rather suggest you investigate how to use job arrays (assuming you are using a resource manager such as SLURM, PBSPro or other). – Gilles Gouaillardet Sep 14 '18 at 04:28
  • @GillesGouaillardet Thank you for your feedback. I have edited the question to include more information about the task, which I think is relevant. Can you provide some context on why MPI is "bad" for embarrassingly parallel tasks? – Tara Eicher Sep 14 '18 at 14:45
  • What does `some_script.sh` end up running ? A MPI app ? – Gilles Gouaillardet Sep 14 '18 at 15:14
  • If `foo` is not a MPI app (this is your case if I understand it correctly), then `mpiexec foo` returns with the slowest process. This is highly suboptimal if there is some imbalance. Also, I do not see how you share memory between non MPI processes. Bottom line, running 1100 looks simpler **and** more efficient to me. – Gilles Gouaillardet Sep 14 '18 at 15:28
  • @GillesGouaillardet That makes a lot of sense, thank you. – Tara Eicher Sep 14 '18 at 19:25

0 Answers0