A global counter, incremented by functions running in parallel in subshells

Question

I have a shell script that is doing some heavy work and keeping track of what it's doing using counters. The heavy work needs to be done for around 30 cases, and as the machine I'm running it on has 48 cores, it is easy to parallelise: I simply run the function in the background, creating a subshell. However, this means I cannot increment a global variable:

$ cat mwe.sh
#!/bin/bash
counter1=0
counter2=0
counter3=0
func() {
    # do some heavy work which may increase one counter
    counter2=$((counter2 + 1))
}

func &
func &
func &
wait

echo $counter1 $counter2 $counter3
$ ./mwe.sh
0 0 0

Bash functions do not have return values as such. Answers to Counter increment in Bash loop not working suggest to write to a file and then read from there, which would require a temporary file for each individual call, which involves some overhead. Other suggestions are to use echo to "return" a string, but if I use stdout for that, the functions cannot write anything else to stdout.

How can I keep track of global counters in a bash shell script, when the heavy work is done in subshells operating in parallel? Is there some way to open a dedicated stream/pipe for each function call, to which the function can write and from which the caller can read? Or is there some other way in which I can keep track of this, without writing a file for each call?

Global variables are something that exist within a single process. You are trying to share the `counter` variables across *multiple* processes, which makes them a shared resource, so you need some sort of interprocess communication between the parent and its children in order for the parent to update the values. — chepner, Dec 21 '18 at 13:04
Shell functions are really more like small programs rather than true functions in the programming language sense, and even more so if you them in a background process. — chepner, Dec 21 '18 at 13:05
@chepner Yes — which unfortunately cannot be a return value in bash, but I was hoping a dedicated stream / pipe may work. But I guess I'll just resort to writing a file for each call, then read from those. — gerrit, Dec 21 '18 at 13:06
@gerrit I would recommend switching your bash script for an app written in GO. the concurrency model is awesome and you will also gain in a huge performance boost for whatever is that you are trying to accomplish. — Matias Barrios, Dec 21 '18 at 15:54
@MatiasBarrios That sounds good, with the one caveat that I first need to learn GO :) I'm actually using this to submit about 50k jobs to a load scheduler, the jobs are written in Python and could certainly be optimised speed wise, but I'm the only one running them and the tradeoff between development time and runtime in this case, along with the luxury of [a decent cluster](https://help.jasmin.ac.uk/article/211-lotus-hardware), means it's not quite worth my time to learn GO or otherwise improve my performance. But the job submission itself is much faster if I do it in parallel. — gerrit, Dec 21 '18 at 23:16

A global counter, incremented by functions running in parallel in subshells

0 Answers0