0

I made a loop in my bash script.

I made a loop where an important thing is it must wait the end of some processes(here scrapy spiders) to be done before to increment variables, which are essential as conditions.

The general algorithm is the following (no programming language used here):

#initialisation
count=0
urlsFileNbLines=$( wc -l < urlsFileToScrape )
while count =<5 or urlsFileNbLines != 0
    launch scrapy spiders
    wait scrapy spiders are done
         add 1 (loop) to $count
         update $urlsFileNbLines

So, the problem is if I don't use the condition to wait the processes to be done (until scrapy spiders are done) , and in the same time I increment the variables, it will launch again scrapy spiders while I must wait the previous are done to update $urlsFileNbLines.


Now, I gonna tackle the bash language part.

To make this condition: until scrapy spiders are done, I was inspired by this. What I understand in Bash shell script to check running process part, is if pgrep -x scrapy returns something, then this is true, implicitly. So I tried to make a condition where it has to wait until it's false. That's why I tried to make until [ ! pgrep -x scrapy ]; do ... and even until [ ! $(pgrep -x scrapy) ]; do but it always give errors.

I tried this too:

At launch scrapy spiders, there is:

for i in `seq 1 5`; do
    scrapy crawl spider -a param=$i & PID$i=$! &
done
echo "here it must wait the end of process. Count value is ${count}"

At until scrapy spiders are done (just below the previous), there is:

for in in `seq 1 5`; do
    wait $PID$i
done
count=$(($count+1))
...

But it does not wait, it makes incrementation of $count very quickly and goes over 5 because the loop for i in seq $1 $maxSeq is not finished that it continues to increment, while the part & PID$i=$! & returns issue script.sh: line 93: PID1=4758 : command not found. That's messy.

What can I do ?


UPDATE

Thanks to @Barmar I made this solution. And to wait all processes I inspired from this topic.

pid=() #an empty array for the pids that are coming
for i in `seq 1 5`; do
    scrapy crawl spider -a param=$i ; pid[$i]=$! &
done
echo "command pgrep before wait" 
echo $(pgrep -x scrapy) 
for pid in ${pid[*]}; do
    wait $pid & echo 'fin du processus ${pid}' &
done
echo "command pgrep after wait"
echo $(pgrep -x scrapy)

It is not waiting for the whole processes to be done before to increment variables. Then it launches again same instances of spiders and it creates conflicts of connection.

To write: wait "${pids[@]}" instead of the loop works perfectly in my case.


bash version: 4.4.20 | OS version: Ubuntu 18.04.3 LTS.

Community
  • 1
  • 1
AvyWam
  • 890
  • 8
  • 28
  • Did you check the return of `wait`? – Matthieu Sep 05 '19 at 20:37
  • 1
    You can't combine variables like `$PID$i`. Use an array. – Barmar Sep 05 '19 at 20:38
  • 1
    If you just use `wait` with no arguments it will wait for all background processes to complete, so you don't need the loop. – Barmar Sep 05 '19 at 20:38
  • @Barmar Why I cannot? It clearly instantiated it well: `PID1=4758`, `PID2=6695` and so on. So I can make `$PID$i=$!`. – AvyWam Sep 05 '19 at 20:40
  • 2
    Because it doesn't combine variables that way. `$PID$i` means to get the value of `$PID` and the value of `$i` and concatenate them, it doesn't mean to append `$i` to the variable name and look up `$PID1`, `$PID2`, etc. – Barmar Sep 05 '19 at 20:42
  • 1
    If you want to calculate a variable name dynamically, you need to use `${!PID$i}`. – Barmar Sep 05 '19 at 20:43
  • And to assign to a calculated variable, you have to use `eval`: `eval "PID$i=$!"` – Barmar Sep 05 '19 at 20:45
  • 2
    It's much easier to use an array: `PID[$i]=$!` and `${PID[$i]}` – Barmar Sep 05 '19 at 20:45
  • @Barmar ${!PID$i} or not will be the same. I need different PID for each spider of the loop, so $i is essential. Then I gonna try array, even if it need reflexion for the `wait` loop. – AvyWam Sep 05 '19 at 20:46
  • The message `PID1=4758 : command not found` is because you can't instantiate it. – Matthieu Sep 05 '19 at 20:46
  • `man bash` then `/nameref` is also an option. – David C. Rankin Sep 05 '19 at 21:48
  • @Barmar well, thank you a lot. I made an update on my topic. Your solution works to instantiate PIDs because I have not any trouble with errors now. – AvyWam Sep 05 '19 at 21:58
  • @l0b0 read more next time. This is not a duplicate at all, and not to the topic you tied it. – AvyWam Sep 05 '19 at 22:43
  • I replaced the duplicate link with a more appropriate one. The use of a `pids` array, as shown in the accepted answer there (and in Barmar's comment above), is *exactly* what you should be doing. – Charles Duffy Sep 05 '19 at 22:51
  • @CharlesDuffy yes it is. – AvyWam Sep 05 '19 at 22:52
  • @CharlesDuffy I still have troubles as I said in UPDATES. I replaced by `for pid in "${pids[@]}"` as you said in the tied topic, but I don't know why following instructions are made even if processes are not done. – AvyWam Sep 06 '19 at 10:40
  • You can't put a `&` after the `wait`; to be useful, it **must** be a foreground process. – Charles Duffy Sep 06 '19 at 13:42
  • @CharlesDuffy I replaced the loop of `${pid[*]}` by simply `wait "${pids[@]}"` and it works perfectly as expected now. – AvyWam Sep 06 '19 at 14:59
  • 1
    You lose information when you do that; wait one-at-a-time and you get individual exit status for each. – Charles Duffy Sep 06 '19 at 15:00

0 Answers0