1

When I have a series of jobs in a bash script (for instance in joblist.sh:each line include one job) and n available number of CPUs in my computer, I parallelize them by setting & at the end of all lines and put wait after every n line. It improves significantly speed of processing however it is not optimized. It waits until all n tasks finish and then it runs next n tasks. It would be much better if it waits until one job is done, then another job from the list of tasks will be replaced in the queue for processing by considering limited free memory. I was wondering if there is a technique in bash programming or using a code without installation as root on the server that could help for this aim. You can answer the question with gnu parallel but I prefer bash command without using gnu parallel.

One Solution (According Parallelize Bash script with maximum number of processes ) without considering free memory is

cat joblist.sh | parallel -j 12

My bash script for creating parallel list of job ( n=12 ):

awk '{print $0"  &"}' joblist.sh > joblist1.sh
awk '1;!(NR%12){print "wait";}' joblist1.sh > joblist_parallel.sh
chmod +x joblist_parallel.sh
  • Probably the `-n` option from `wait` if your bash supports it. – Jetchisel Aug 27 '22 at 21:36
  • When wait is invoked with the -n option, the command waits only for a single job from the given pids to complete and returns its exit status. The problem is I have a list of m tasks with n number of CPUs and I want to replace just the task which is finished with a new one until all m tasks are finished, so it is more complicated but I think it might be useful – Mohammad Mohseni Aref Aug 27 '22 at 21:47
  • 1
    If you are not prepared to install **GNU Parallel**, why tag it? – Mark Setchell Aug 27 '22 at 21:48
  • I taged it because it is possible to use it without root permission in server for instance by conda install -c conda-forge parallel – Mohammad Mohseni Aref Aug 27 '22 at 21:50
  • 2
    I have no idea why you would use `conda` to install **GNU Parallel**. You can install it without `conda` and without `root` and it can do what you need. – Mark Setchell Aug 27 '22 at 21:54
  • And you may still have xargs with `-P` option available. – KamilCuk Aug 27 '22 at 21:55
  • I strongly agree with you, it was just an example how to install it without being root @MarkSetchell – Mohammad Mohseni Aref Aug 27 '22 at 21:56
  • I would like to use for instance parallel command but the problem is it eats more than 40 CPUs thread for n=40 jobs in command like parallel --jobs 40 < joblist.txt . I was wondering how can I manage the number of load based on joblist file – Mohammad Mohseni Aref Aug 27 '22 at 22:32
  • If that is actually your question, please click [edit] and clarify that you can/could use **GNU Parallel** but have some specific concerns. Such material belongs in the question, rather than being buried in comments. Thank you. – Mark Setchell Aug 27 '22 at 22:44
  • Please add to your question (no comment): What have you searched for, and what did you find? What have you tried, and how did it fail? Show your code. – Cyrus Aug 27 '22 at 23:47
  • Does this answer your question? [Parallelize Bash script with maximum number of processes](https://stackoverflow.com/questions/38160/parallelize-bash-script-with-maximum-number-of-processes) – pjh Aug 28 '22 at 00:49
  • Also see [ProcessManagement - Greg's Wiki](https://mywiki.wooledge.org/ProcessManagement). – pjh Aug 28 '22 at 00:50
  • 1
    According to your comment @pjh ,I can answer my question for --max-proc=12 , it could be `split -l 1 joblist.sh | xargs --max-args=1 --max-procs=12` – Mohammad Mohseni Aref Aug 28 '22 at 01:49

1 Answers1

1

Parallelizing jobs is not a simple task, GNU Parallel is the right tool . But If you want to stick to bash , a solution is to use jobs .

jobs -lr will list all tasks you started in background with &

#!/bin/bash

my_complex_bash_job () {
    JOB=$(echo $RANDOM | md5sum )
    printf "[%s] JOB %02d begin $JOB\n" "$(gdate  +%F\ %T)" $1
    sleep $(( 4*(2 + $RANDOM % 10 )))
    printf "[%s] JOB %02d end   $JOB\n" "$(gdate  +%F\ %T)" $1
}



for J in $(seq 1  20 )
do
    my_complex_bash_job $J &
    R=$(jobs -lr| wc -l )
    while [ $R -gt 3 ]
    do
        sleep 1
        R=$(jobs -lr| wc -l )
    done
done
EchoMike444
  • 1,513
  • 1
  • 9
  • 8