How to run the multiple instance in parallel for time efficiency in shell scripting

Question

I am using shell scripting, where it reads the input file of 16000 lines. It takes more than 8 hours to run the script. I need to reduce that so I divided that to 8 instance and read the data, where I used for loop iterate the 8 files and inside it while loop to read the record from the files. But it is not working. how can i run 8 instance parallel in background I need help to run it more efficiency like using a functions or forking process.

Here is the code

for file in "$MY_WORK/CCN_split_files"/*
do
    echo "$file"
    echo "begin read loop"
    ### removing the header record from the file ###
    if [ "$file" == "$MY_WORK/CCN_split_files/ccn.email.list.file00" ] 
    then
        mv $MY_WORK/CCN_split_files/ccn.email.list.file00 $MY_WORK/raw_file
        sed -e '/ Regular  /d; / Duplicate  /d' $MY_WORK/raw_file > $MY_WORK/CCN_split_files/ccn.email.list.file00
    fi
    ### end of removing header record  ###

    while read -r record
    do
      reccount=$(( reccount + 1 ))

        ### parse input record

          contact_email=`echo "$record" | cut -f5 -d ''`
              echo "contact email is $contact_email" 
          credit_card_id=`echo "$record" | cut -f6 -d ''`
              echo "credit card id is $credit_card_id"
          ref_nr=`echo "$record" | cut -f7 -d ''`
              echo "reference nr is $ref_nr"
          cny_cd=`echo "$record" | cut -f8 -d ''`
              echo "country code is $cny_cd"
          lang=`echo "$record" | cut -f9 -d ''`
              echo "language is $lang"
          pmt_ir=`echo "$record" | cut -f13 -d ''`
              echo "payment ir is $pmt_ir"

        ### set paypal or credit card 

          if [ "$pmt_ir" = "3" ]
            then
              pmt_typ="PP"
              echo "payment type is $pmt_typ"
          else
              pmt_typ="CC"
              echo "payment type is $pmt_typ"
          fi

        ### retrieve doc from application

          echo "retrieve from CMOD for $ref_nr"
          GetExit01Cntr=0
          GetExit01='F'
          until [[ $GetExit01 = 'T' ]]
           do
            GetExit01Cntr=`expr $GetExit01Cntr + 1`

            /opt/ondemand/bin/arsdoc get -ac -d $MY_WORK -h $host -u $user -p $pwd -v -i  "WHERE ReferenceNumber='$ref_nr' AND CreditCardId='$credit_card_id'" -f "$folder" -L1 -o "$notify_afp" -v 2> $MY_WORK/$arsdoc_out
            if grep "Retrieving 1 document(s)." $MY_WORK/$arsdoc_out > /dev/null
            then
               GetExit01='T'
               echo "CCN AFP retrieval successful"
            else
               echo "CCN AFP retrieval failed - Performing retry (${GetExit01Cntr})"
               sleep 30
               GetExit01='F'
               if [[ $GetExit01Cntr -ge 3 ]]
               then
                  echo "Max Retry Failure: (GetExit01) - Failed to successfully perform arsdoc get"
                  echo "CCN AFP retrieval failed"
                  echo "CCN AFP retrieval failed" >> $MY_WORK/$logfile
                  exit 12
               fi
            fi   
           done

        ### convert to PDF

          echo "afp2pdf conversion begins"

          /a585/app/AFP2PDF_PLUS/afp2pdf.sh -i /a585/app/AFP2PDF_PLUS/a2pxopts2.cfg -n /a585/app/AFP2PDF_PLUS/font -o $MY_WORK/$notify_pdf $MY_WORK/$notify_afp > $MY_WORK/$afp2pdf_out 2>&1

          ReturnCode=`echo $?`
          if [ "$ReturnCode" != "0" ]
            then
             echo "afp2pdf failed"
             echo "afp2pdf failed" >> $MY_WORK/$logfile
             exit 12
          fi

        ### assign message text, subject, and reply address variables

          echo "assign message text, subject, reply"
          if [ $cny_cd = "US" ] && [ $lang = "EN" ] && [ $pmt_typ = "CC" ]
            then
               email_text=$MSG_PATH/ccnotifyusen.new
               email_reply="abx@xx.com"
               email_subject=" Credit Card Billing Adjustment. Ref# $ref_nr" 

             elif [ $cny_cd = "CA" ] && [ $lang = "EN" ] && [ $pmt_typ = "CC" ]
               then
                 email_text=$MSG_PATH/ccnotifycaen.new
                 email_reply="abx@xx.com"
                 email_subject="Credit Card Billing Adjustment. Ref# $ref_nr" 

             elif [ $cny_cd = "CA" ] && [ $lang = "FR" ] && [ $pmt_typ = "CC" ]
               then
                 email_text=$MSG_PATH/ccnotifycafr.new
                 email_reply="abx@xx.com"
                 email_subject=" Rajustement des frais. Ref. $ref_nr"

             elif [ $cny_cd = "US" ] && [ $lang = "EN" ] && [ $pmt_typ = "PP" ]
               then
                 email_text=$MSG_PATH/ppnotifyusen.new
                 email_reply="abx@xx.com"
                 email_subject=" Billing Adjustment. Ref# $ref_nr"

             elif [ $cny_cd = "CA" ] && [ $lang = "EN" ] && [ $pmt_typ = "PP" ]
               then
                 email_text=$MSG_PATH/ppnotifycaen.new
                 email_reply="abx@xx.com"
                 email_subject=" Billing Adjustment. Ref# $ref_nr"

             elif [ $cny_cd = "CA" ] && [ $lang = "FR" ] && [ $pmt_typ = "PP" ]
               then
                 email_text=$MSG_PATH/ppnotifycafr.new
                 email_reply="ssunkara@ups.com"
                 email_subject_text=`cat $MSG_PATH/ppsubjectcafr`
                 email_subject="$email_subject_text $ref_nr"

             else
               echo "invalid country, language, payment type combination: $cny_cd, $lang, $pmt_typ"
               echo "invalid country, language, payment type combination: $cny_cd, $lang, $pmt_typ" >> $MY_WORK/$logfile
               exit 12
          fi

        ### overlay reply address in .muttrc initialization file

          cd /a585/app/script/
          echo "email via NSGalinaMail"

          /usr/bin/java -jar NSGalinaMail.jar "$email_text"  "$email_subject" "$contact_email" "abc@xx.com" $lang  $cny_cd $MY_WORK/$notify_pdf
          if [ $? -eq 0 ]; then
              emailCountSuccess[$reccount-1]="Success: Email to $contact_email for $ref_nr" 
           else
              emailCountFailure[$reccount-1]="Failure: Email to $contact_email for $ref_nr" 
           fi

    done < $file
done

@kaylum- thanks for your help, but i am looking how can use my code (which has multipe files\instance) to run parallel, so that i can reduce the duration of execution. — sai prudhvi, Jan 28 '20 at 10:47
What are you trying to accomplish with `cut -d ''`? Is that a copy paste error and you are actually doing `-d ' '` (setting the delimiter to a single space)? — William Pursell, Jan 28 '20 at 15:46
Whatever it is you're trying to do with `-d`, stop. Don't use `cut` to parse the line. Instead, do something like `while read -r a b c d contac_email credit_card_id ...` — William Pursell, Jan 28 '20 at 15:47
hi @william i am using the cut command to fetch the fields between 'VT'. sample data is ` myname VT email@xx.com VT credit_card_id VT ref_no CRLF` — sai prudhvi, Jan 29 '20 at 04:06

Mark Setchell · Answer 1 · 2020-01-28T15:34:10.880

If you want lots of stuff done in parallel, consider using GNU Parallel. There is a great PDF here explaining how to use it. Specifically, I was using "Section 9 - Pipe Mode" to answer your question.

I am not re-writing all your code for you, just showing you some ideas.

Let's generate a sample file of 16,000 lines to match yours:

seq 16000 > YourFile

And now let's generate a dummy script, called YourScript to process your data, like this:

#!/bin/bash
lines=$(wc -l < /dev/stdin)
echo "Called to process $lines lines"
sleep 2

As you can see, it just count the lines it receives on its stdin and tells you how many there are and sleeps for 2s so you can see what is happening. Make it executable with:

chmod +x YourScript

Now, you can use GNU Parallel. First, let GNU Parallel split your file into chunks of 4,000 lines and pass one chunk to each of 4 jobs:

parallel --pipe -N4000 ./YourScript  < YourFile

Called to process     4000 lines
Called to process     4000 lines
Called to process     4000 lines
Called to process     4000 lines

If you have 4 or more CPU cores, that will have taken 2s because, by default, GNU Parallel starts one job per CPU core.

Now try passing 2,000 lines to each job, and running 4 jobs at a time:

parallel --pipe -j 4 -N2000 ./YourScript  < YourFile

Called to process     2000 lines
Called to process     2000 lines
Called to process     2000 lines
Called to process     2000 lines
Called to process     2000 lines
Called to process     2000 lines
Called to process     2000 lines
Called to process     2000 lines

That will run the first 4 lots of 2,000 lines in 2s, then the second 4 lots of 2,000 lines in a further 2s.

Hopefully you can now see how to parallelise your script. Remember to read from stdin, not from a file!!! If you want your script to run using the filename of your 16,000 line file as a parameter, or the filename of a chunk of that file as chunked up GNU Parallel, you could use:

parallel --pipe -N 2000 --cat YourScript {}

then it will write a temporary file with 2,000 lines call your script and delete the temporary file afterwards.

Useful switches to GNU Parallel are:

parallel --dry-run ... which tells you what it would do without actually doing anything
parallel --bar ... which gives you a progress bar
parallel --eta ... which gives you an ETA

Note also that GNU Parallel can distribute work across other machines in your network, and it has fail and retry handling, output tagging and so on...

Also, you run cut 6 times for each line of your 16,000 line file - that means you have to fork nearly 100,000 processes! You can use IFS and read instead of those 6 processes:

IFS='|' read -r f1 f2 f3 <<< "a|b|c"

Irsute · Answer 2 · 2020-01-29T11:06:35.477

May be you could declare the tasks ("### parse input record / ### set paypal or credit card / ... and so on ) whithin a function:

proceed_tasks (){
  ### parse input record
  ### set paypal or credit card
 }

then run the loop :

while read -r record
  do
    (proceed_tasks) &
    if (( $i % 50 == 0 ));then wait; fi # Limit to 50 concurrent subshells.
 done

As subshell advise. That should create as many as subprocesses needed, (in the limit of 50)

How to run the multiple instance in parallel for time efficiency in shell scripting

2 Answers2

Linked