Using parallel in bash script in order to processing grouped input files

Question

I have a bash script which processing each file in some directory:

for (( index=0; index<$COUNT; index++ ))
do
    srcFile=${INCOMING_FILES[$index]}
    ${SCRIPT_PATH}/control.pl ${srcFile} >> ${SCRIPT_PATH}/${LOG_FILE} &
    wait ${!}
    removeIncomingFile ${srcFile}
done

and for few files it works fine but when the number of files is quite large is too slow. I want to use this script parallel to processing grouped files.

Example files:

script should processing files related to each server parallel.
First instance - server_1*
Second instance - server_2*
Third instance - server_3*

Is it possible using GNU Parallel and how it can be reached? Many thanks for each solution!

Why run just one command in background and then wait? Makes more sense if doing several at once... — Paul Hodges, Oct 30 '18 at 20:32
[this response](https://stackoverflow.com/questions/52971764/for-loop-bash-scripts-parallel/52972662#52972662) might help you. It is logic to spawn commands in background and wait for them. There's even a POC version of a spooling script. That page also has lots of useful info about `parallel`. — Paul Hodges, Oct 30 '18 at 20:35
Nothing in your code relates to the server numbers you mention! What are the pipe symbols (`|`) trying to tell me? — Mark Setchell, Oct 30 '18 at 21:10

score 1 · Answer 1 · answered Oct 30 '18 at 21:08

I can't make head nor tail of what your question is trying to say, but I suspect the following will make a reasonable starting point. You put your actual code inside the '...' instead of the dummy actions I have used:

#!/bin/bash

# Do stuff for server 1
parallel -k 'echo server_1_{} ; date >> log_1_{}' ::: {1..3}

# Do stuff for server 2
parallel -k 'echo server_2_{} ; date >> log_2_{}' ::: {1..3}

# Do stuff for server 3
parallel -k 'echo server_3_{} ; date >> log_3_{}' ::: {1..3}

Sample Output

server_1_1
server_1_2
server_1_3
server_2_1
server_2_2
server_2_3
server_3_1
server_3_2
server_3_3

Log files created

-rw-r--r--  1 mark  staff     29 30 Oct 21:04 log_1_1
-rw-r--r--  1 mark  staff     29 30 Oct 21:04 log_1_2
-rw-r--r--  1 mark  staff     29 30 Oct 21:04 log_1_3
-rw-r--r--  1 mark  staff     29 30 Oct 21:04 log_2_1
-rw-r--r--  1 mark  staff     29 30 Oct 21:04 log_2_2
-rw-r--r--  1 mark  staff     29 30 Oct 21:04 log_2_3
-rw-r--r--  1 mark  staff     29 30 Oct 21:04 log_3_1
-rw-r--r--  1 mark  staff     29 30 Oct 21:04 log_3_2
-rw-r--r--  1 mark  staff     29 30 Oct 21:04 log_3_3

Ole Tange · Answer 2 · 2018-10-31T09:38:01.397

1

The grouping part confuses me.

I have the feeling you want them grouped because you do not want to overload the server.

Normally you would simply do:

parallel "control.pl {}; removeIncomingFile {}" ::: incoming/files* > my.log

This will run one job per CPU thread.

Consider spending 20 minutes on reading chapter 1+2 of "GNU Parallel 2018" (printed, online). I think it will help you understand the basic uses of GNU Parallel.

edited Oct 31 '18 at 09:38

answered Oct 31 '18 at 08:08

Ole Tange

31,768
5
86
104

Thanks for the answer. I have a lot of servers which should be monitored. Prepared files from each server are provided to one machine where script processing them in date order. This date is a part of file name fe. TYPE1_server01_20181030_194002.out. After file processing the data are inserting into database. Based on this I can prepare availibility report etc. I want to processing those files parallel, for each server in date order. – Peter F Oct 31 '18 at 17:53

Using parallel in bash script in order to processing grouped input files

2 Answers2