0

I have a bash script which processing each file in some directory:

for (( index=0; index<$COUNT; index++ ))
do
    srcFile=${INCOMING_FILES[$index]}
    ${SCRIPT_PATH}/control.pl ${srcFile} >> ${SCRIPT_PATH}/${LOG_FILE} &
    wait ${!}
    removeIncomingFile ${srcFile}
done

and for few files it works fine but when the number of files is quite large is too slow. I want to use this script parallel to processing grouped files.

Example files:

server_1_1 | server_2_1 | server_3_1
server_1_2 | server_2_2 | server_3_2
server_1_3 | server_2_3 | server_3_3

script should processing files related to each server parallel.
First instance - server_1*
Second instance - server_2*
Third instance - server_3*

Is it possible using GNU Parallel and how it can be reached? Many thanks for each solution!

Kevin Workman
  • 41,537
  • 9
  • 68
  • 107
Peter F
  • 59
  • 5
  • Why run just one command in background and then wait? Makes more sense if doing several at once... – Paul Hodges Oct 30 '18 at 20:32
  • 2
    [this response](https://stackoverflow.com/questions/52971764/for-loop-bash-scripts-parallel/52972662#52972662) might help you. It is logic to spawn commands in background and wait for them. There's even a POC version of a spooling script. That page also has lots of useful info about `parallel`. – Paul Hodges Oct 30 '18 at 20:35
  • Nothing in your code relates to the server numbers you mention! What are the pipe symbols (`|`) trying to tell me? – Mark Setchell Oct 30 '18 at 21:10

2 Answers2

1

I can't make head nor tail of what your question is trying to say, but I suspect the following will make a reasonable starting point. You put your actual code inside the '...' instead of the dummy actions I have used:

#!/bin/bash

# Do stuff for server 1
parallel -k 'echo server_1_{} ; date >> log_1_{}' ::: {1..3}

# Do stuff for server 2
parallel -k 'echo server_2_{} ; date >> log_2_{}' ::: {1..3}

# Do stuff for server 3
parallel -k 'echo server_3_{} ; date >> log_3_{}' ::: {1..3}

Sample Output

server_1_1
server_1_2
server_1_3
server_2_1
server_2_2
server_2_3
server_3_1
server_3_2
server_3_3

Log files created

-rw-r--r--  1 mark  staff     29 30 Oct 21:04 log_1_1
-rw-r--r--  1 mark  staff     29 30 Oct 21:04 log_1_2
-rw-r--r--  1 mark  staff     29 30 Oct 21:04 log_1_3
-rw-r--r--  1 mark  staff     29 30 Oct 21:04 log_2_1
-rw-r--r--  1 mark  staff     29 30 Oct 21:04 log_2_2
-rw-r--r--  1 mark  staff     29 30 Oct 21:04 log_2_3
-rw-r--r--  1 mark  staff     29 30 Oct 21:04 log_3_1
-rw-r--r--  1 mark  staff     29 30 Oct 21:04 log_3_2
-rw-r--r--  1 mark  staff     29 30 Oct 21:04 log_3_3
Mark Setchell
  • 191,897
  • 31
  • 273
  • 432
1

The grouping part confuses me.

I have the feeling you want them grouped because you do not want to overload the server.

Normally you would simply do:

parallel "control.pl {}; removeIncomingFile {}" ::: incoming/files* > my.log

This will run one job per CPU thread.

Consider spending 20 minutes on reading chapter 1+2 of "GNU Parallel 2018" (printed, online). I think it will help you understand the basic uses of GNU Parallel.

Ole Tange
  • 31,768
  • 5
  • 86
  • 104
  • Thanks for the answer. I have a lot of servers which should be monitored. Prepared files from each server are provided to one machine where script processing them in date order. This date is a part of file name fe. TYPE1_server01_20181030_194002.out. After file processing the data are inserting into database. Based on this I can prepare availibility report etc. I want to processing those files parallel, for each server in date order. – Peter F Oct 31 '18 at 17:53