7

(I have searched and expected this question to have been asked before but couldn't find anything like this although there are plenty of similar questions)

I want this for-loop to run in 3 different threads/processes and wait seem to be the right command

for file in 1.txt 2.txt 3.text 4.txt 5.txt
        do something lengthy &
        i=$((i + 1))
        wait $!
done

But this construct, I guess, just starts one thread and then wait until it is done before it starts the next thread. I could place wait outside the loop but how do I then

  1. Access the pids?
  2. Limit it to 3 threads?
d-b
  • 695
  • 3
  • 14
  • 43
  • Do I understand correctly that you want five mutually independent tasks to be processed in three threads (with queuing as-it-happens) and the sole purpose of `wait` is to make sure that nothing else happens before all five have exited? – Dario Apr 13 '18 at 18:54
  • 3
    You don't necessarily have to give `wait` a PID. If you call `wait` with no arguments it will wait on all background processes, so putting the `wait` after `done` will wait for all threads to complete. Not sure how to limit to 3 threads though... – 0x5453 Apr 13 '18 at 18:57
  • @Dario I have two functions, 1 and 2. 1 (the one above) can be paralleliliced but 2 can't be run until all 5 files are processed. I have 4 cores and I need to leave one alone so everything else can run uninterrupted. If I understand your question correctly the anser is "yes", – d-b Apr 13 '18 at 19:14
  • These are *processes,* not *threads.* – tripleee Apr 13 '18 at 19:39
  • `bash` by itself isn't really suitable for maintaining a process pool like this. – chepner Apr 13 '18 at 19:54
  • Could you describe how the similar questions you found *weren't* suited? ("I considered specific-other-question X, but it was suitable only for Y and my situation is Z"). Otherwise, it's hard to know *why* this shouldn't be closed as duplicative, since this is a general request we get a lot, and have answered and re-answered numerous times. – Charles Duffy Apr 13 '18 at 20:03

4 Answers4

4

The jobs builtin can list the currently running background jobs, so you can use that to limit how many you create. To limit your jobs to three, try something like this:

for file in 1.txt 2.txt 3.txt 4.txt 5.txt; do
  if [ $(jobs -r | wc -l) -ge 3 ]; then
    wait $(jobs -r -p | head -1)
  fi

  # Start a slow background job here:
  (echo Begin processing $file; sleep 10; echo Done with $file)&
done
wait # wait for the last jobs to finish
chepner
  • 497,756
  • 71
  • 530
  • 681
Rob Davis
  • 15,597
  • 5
  • 45
  • 49
  • 2
    More than one job could complete by the time the job you choose to wait on completes. This isn't a good way to keep your process pool busy. – chepner Apr 13 '18 at 19:52
  • 1
    (`wait -n`, introduced in `bash` 4.3, is an improvement, in that you only have to block until an arbitrary process completes, but that doesn't mean that *only* one process has completed, and jobs can continue to complete while you are deciding how many new processes you can start.) – chepner Apr 13 '18 at 19:54
  • True, although more importantly the job that we're waiting on may actually be the last of the three to finish -- who knows -- so it's not optimal. As you say in the question comments, bash on its own isn't really suitable for managing concurrency. However, given the bash primitives, this is a relatively simple way to avoid going over the process limit, even though it may underutilize the pool. – Rob Davis Apr 13 '18 at 21:56
3

The GNU Parallel might be worth a look.

My first attempt,

parallel -j 3 'bash -c "sleep {};   echo {};"' ::: 4 1 2 5 3

can be, according to the inventor of parallel, be shortened to

parallel -j3 sleep {}\; echo {} ::: 4 1 2 5 3
1
2
4
3
5

and masking the semicolon, more friendly to type, like this:

parallel -j3 sleep {}";" echo {} ::: 4 1 2 5 3

works too.

It doesn't look trivial and I only tested it 2 times so far, once to answer this question. parallel --help shows a source where there is more info, the man page is a little bit shocking. :)

parallel -j 3 "something lengthy {}" ::: {1..5}.txt

might work, depending on something lengthy being a program (fine) or just bashcode (afaik, you can't just call a bash function in parallel with parallel).

On xUbuntu-Linux 16.04, parallel wasn't installed but in the repo.

user unknown
  • 35,537
  • 11
  • 75
  • 121
  • 1
    First example shorter: `parallel -j3 sleep {}\; echo {} ::: 4 1 2 5 2` – Ole Tange Apr 17 '18 at 09:39
  • @OleTange: Hi Ole, and thanks for parallel. Seen 3 or 4 of your videos so far, and tutorial is open in one of the 40 tabs, waiting for me to have some more time. – user unknown Apr 17 '18 at 12:28
1

Building on Rob Davis' answer:

#!/bin/bash
qty=3

for file in 1.txt 2.txt 3.txt 4.txt 5.txt; do
    while [ `jobs -r | wc -l` -ge $qty ]; do
        sleep 1
        # jobs #(if you want an update every second on what is running)
    done
    echo -n "Begin processing $file"
    something_lengthy  $file &
    echo $!
done
wait
Ljm Dullaart
  • 4,273
  • 2
  • 14
  • 31
0

You can use a subshell approach example

 ( (sleep 10) &
    p1=$!
    (sleep 20) &
    p2=$!
    (sleep 15) &
    p3=$!
    wait
    echo "all finished ..." )

Note wait call wait for all child inside a subshell, you can use modulo operator (%) with 3 and use the reminder to check for 1st 2nd and 3rd process id (if needed) or can use it to run 3 parallel thread. Hope this helps.

Satyam Naolekar
  • 560
  • 7
  • 10