I run a very simple shell script that performs some transformations on files I download every day. Typically it is a zip archive with six files in it that I then process in five different steps before I insert the content into a database. The first step takes 5-8 minutes/file and is limited by the CPU.
I have two computers I perform this task on, one with two cores and one with four cores and hyperthreading. Since the first step takes 30+ minutes in my current setup I would like to multithread it.
The first step is basically
for file in *.txt
dosomething "$file" "$file.csv"
done
On my 2 core computer I would like to process two files in parallell, on my 8 thread machine I would like to process all six files in parallell (and it would be nice if it the day the archive contains 9 files would handle that nicely). All files must be processed before the next step (which is much faster).
How do I start a suitable number of threads/processes and then don't start the execution of the next step until the previous step is completely finished?