Bash increase pid kernel to unlimited for huge loop

Question

I've been try to make cURL on a huge loop and I run the cURL into background process with bash, there are about 904 domains that will be cURLed

and the problem is that 904 domains can't all be embedded because of the PID limit on the Linux kernel. I have tried adding pid_max to 4194303 (I read in this discussion Maximum PID in Linux) but after I checked only domain 901 had run in background proccess, before I added pid_max is only around 704 running in the background process.

here is my loop code :

count=0
while IFS= read -r line || [[ -n "$line" ]]; 
    do
      (curl -s -L -w "\\n\\nNo:$count\\nHEADER CODE:%{http_code}\\nWebsite : $line\\nExecuted at :$(date)\\n==================================================\\n\\n" -H "X-Gitlab-Event: Push Hook" -H 'X-Gitlab-Token: '$SECRET_KEY --insecure $line >> output.log) &

  (( count++ ))
done < $FILE_NAME

Anyone have another solution or fix it to handle huge loop to run cURL into background process ?

xargs `-P` option may help, to limit the number of process at a time — Nahuel Fouilleul, Jul 11 '19 at 08:08
i added comment before the code was posted, seems will not be easy because of count variable, otherwise could be `xargs -n1 -P50 bash -c '....' - < "$FILE_NAME"`, where 50 is the number of running process at a time, and `.... ` the command to execute, and `"$1"` the argument to be used, 'the `-` is for `"$0"` — Nahuel Fouilleul, Jul 11 '19 at 08:18
` --process-slot-var=count` can be used to pass the index from [this answer](https://unix.stackexchange.com/a/449225/23266) — Nahuel Fouilleul, Jul 11 '19 at 08:25

score 2 · Accepted Answer · answered Jul 11 '19 at 08:31

2

a script example.sh can be created

#!/bin/bash

line=$1
curl -s -L -w "\\n\\nNo:$count\\nHEADER CODE:%{http_code}\\nWebsite : $line\\nExecuted at :$(date)\\n==================================================\\n\\n" -H "X-Gitlab-Event: Push Hook" -H 'X-Gitlab-Token: '$SECRET_KEY --insecure $line >> output.log

then the command could be (to limit number of running process at a time to 50)

xargs -n1 -P50 --process-slot-var=count ./example.sh < "$FILE_NAME"

answered Jul 11 '19 at 08:31

Nahuel Fouilleul

18,726
2
31
36

so here the while loop can be handled by xargs ? – 0x00b0 Jul 11 '19 at 08:34
1

yes, xargs essentially "builds and executes command lines from standard input", see `man xargs` for more details – Nahuel Fouilleul Jul 11 '19 at 08:37
`xargs -n1 -P50 --process-slot-var=count ./deployer.sh < "$FILE_NAME" ` it's showing message **bash: : No such file or directory** , though the file is available in current directory – 0x00b0 Jul 11 '19 at 08:42
@0x00b0 did you set FILE_NAME variable? This error would happen on empty redirection `echo < ""` – KamilCuk Jul 11 '19 at 08:53
also in posted code there was SECRET_KEY which sould also be set – Nahuel Fouilleul Jul 11 '19 at 08:55

score 2 · Answer 2 · answered Jul 11 '19 at 09:12

Even if you could run that many processes in parallel, it's pointless - starting that many DNS queries to resolve 900+ domain names in a short span of time will probably overwhelm your DNS server, and having that many concurrent outgoing HTTP requests at the same time will clog your network. A much better approach is to trickle the processes so that you run a limited number (say, 100) at any given time, but start a new one every time one of the previously started ones finishes. This is easy enough with xargs -P.

xargs -I {} -P 100 \
    curl -s -L \
        -w "\\n\\nHEADER CODE:%{http_code}\\nWebsite : {}\\nExecuted at :$(date)\\n==================================================\\n\\n" \
        -H "X-Gitlab-Event: Push Hook" \
        -H "X-Gitlab-Token: $SECRET_KEY" \
        --insecure {} <"$FILE_NAME" >output.log

The $(date) result will be interpolated at the time the shell evaluates the xargs command line, and there is no simple way to get the count with this mechanism. Refactoring this to put the curl command and some scaffolding into a separate script could solve these issues, and should be trivial enough if it's really important to you. (Rough sketch:

xargs -P 100 bash -c 'count=0; for url; do

        curl --options --headers "X-Notice: use double quotes throughout" \
            "$url"
        ((count++))
    done' _ <"$FILE_NAME" >output.log

... though this will restart numbering if xargs receives more URLs than will fit on a single command line.)

TIL `xargs --process-slot-var` exists but it's a relatively new feature and GNU only. If you have it, by all means use it. — tripleee, Jul 11 '19 at 09:13

Bash increase pid kernel to unlimited for huge loop

2 Answers2