2

I'm running a computationally heavy program on a list of files in bash. If I do them one at a time, I do not utilize my computer power, yet if I append the & symbol to the command to run them in background processes, I'm running too many. What I'm looking for is a way to specify that I want n processes to work through a particular list of items files. When one finishes, it moves on to another.

As a minimal example, here is some setup code to replicate my situation:

$ mkdir test
$ cd test

$for i in {1..1000}
>do
>   echo "$i" >> $i.txt
> done

How would I use (say) 2 processes only to process this list of files so that the output in each file does some arbitrary operation to the number $i (maybe add two or something) and then prints, done by process 1 or 2, depending on whether process 1 or 2 did the operation?

CiaranWelsh
  • 7,014
  • 10
  • 53
  • 106
  • Possible duplicate of [How to parallelize for-loop in bash limiting number of processes](https://stackoverflow.com/questions/38774355/how-to-parallelize-for-loop-in-bash-limiting-number-of-processes) – Aserre Jun 21 '19 at 10:13
  • Here's how you can do it in pure in-shell, possibly without nonstandard external utils: https://unix.stackexchange.com/a/216475/23692. – Petr Skocik Jun 21 '19 at 11:33

2 Answers2

2

Your example is not very sensible so it is hard to advise you better, but you can use GNU Parallel for this.

Say you want to run HeavyProcessing on all files starting with SeriousData using two CPU cores in parallel:

parallel -j 2 HeavyProcessing ::: SeriousData*

Slightly different example, say the filenames you want to process are in a file called FileList.txt and you want to run one process per core that your CPU has and also get a progress bar:

parallel -a FileList.txt --bar HeavyProcessing
Mark Setchell
  • 191,897
  • 31
  • 273
  • 432
1

One solution could be xargs -P but it requires some extra noise to make it work. Here is a solution that maps to your example:

printf '%s\0' {1..1000} | xargs -0 -rn1 -P2 bash -c 'echo "$1" >> "$1".txt' --

explanation:

  • -0: separate input parameters at \0 byte (because that's what printf '%s\0' … sends)
  • -r: don't run anything, if there is no input
  • -n1: use only one input parameter per process
  • -P2: use at most 2 parallel processes
  • bash -c '…' --: the program to run; running a shell from xargs requires -- to properly bind the positional parameters
  • 'echo "$1" >> "$1".txt': the actual piece of shell code

The last pieces get a lot easier, if the bulk code you want to run does not require special shell features like redirection. You could run your program explicitly from xargs without a bash -c indirection.

Robin479
  • 1,606
  • 15
  • 16