0

I have a bash script that parses log files - aggregating data in an AWK array - that takes part of the file path as a parameter. It runs fine, I can run multiple instances in the background manually. The trouble is I can't figure out how to avoid invoking the script manually for each parameter in my list.

Depending on where I've put the & it either runs the instances serially or tries to run all the jobs at once (I don't want to see a load average of 9999 again).

script.sh param1 &

script.sh param2 & ... #works fine 

script.sh < params.txt & ... #runs serially

Placing & at various places within the script had some undesirable outcomes.

hub=$1
while read date; do
    zgrep ^1 /logarchive/http/${hub}pr*/$date*.gz|\
    awk -F'[ ,]' '{print$34,$(NF-6),$6,$(NF-7)}'|\
    awk 'NR>1{bytesDown[$1 " " $2] += $3; bytesUp[$1 " " $2] += $4} END {for (i in bytesDown) print i, bytesDown[i], bytesUp[i]}'\
    > ${hub}.$date.txt
done < dates.txt

I'd like to run an instance in the background for each parameter in a file.

brunorey
  • 2,135
  • 1
  • 18
  • 26
imac
  • 47
  • 9
  • Personally, btw, I would write this with just one awk invocation, like the example at https://gist.github.com/charles-dyfis-net/955dcecaad1b11575deb6713f85efa49 -- not incorporating that change into my answer because it can't be tested without input samples, which aren't included in the question. – Charles Duffy Feb 08 '19 at 16:07

1 Answers1

0

Use export -f to export a function, and then you can call it in parallel from shells started by xargs -P; in the example below, numjobs indicates how many dates you want to run concurrently.

myfunc() {
    date=$1
    zgrep ^1 "/logarchive/http/${hub}pr*/$date"*.gz | \
    awk -F'[ ,]' '{print$34,$(NF-6),$6,$(NF-7)}'    | \
    awk '
      NR>1{
        bytesDown[$1 " " $2] += $3
        bytesUp[$1 " " $2] += $4
      }
      END {
        for (i in bytesDown) print i, bytesDown[i], bytesUp[i]
      }
    ' >"${hub}.$date.txt"
}
export -f myfunc
numjobs=8

xargs -P "$numjobs" -n 1 bash -c 'myfunc "$@"' _
Charles Duffy
  • 280,126
  • 43
  • 390
  • 441
  • Thanks for the awk suggestions.What does the final line do, from -n 1 onwards? – imac Nov 04 '19 at 11:43
  • Learned another lesson in the meantime, comments are editable for 5 minute only! @charles-duffy Thanks for the awk suggestions in the link. What does each element of the the final line above do, from -n 1 onwards? Originally, I was reading the dates from a file and passing 'hub' to my script; is there a reason you changed date to be $1, and where would it get 'hub' from? Does the function have to be input at the CLI each time? Can that be saved as a script? – imac Nov 04 '19 at 11:54
  • `date` is the `$1` *of `myfunc`*, as functions have their own argument lists. That doesn't mean that `$1` *outside* `myfunc` changes, so you can still set `hub=$1` before `myfunc` is run. – Charles Duffy Nov 04 '19 at 16:03
  • ...as for the final line -- `-n 1` tells `xargs` to start a separate copy of `myfunc` for each file, vs passing a single `myfunc` several jobs as separate arguments (which it would need to be written to handle). `bash -c` runs bash, with the next argument as the source to execute as a script, and the arguments after that being `$0`, `$1`, etc. Passing `"$@"` to `myfunc` means that the arguments (`$1` and on) passed to the copy of `bash` are in turn passed through to `myfunc`. – Charles Duffy Nov 04 '19 at 16:06