0

I am using aria2 to download some data with the option --on-download-complete to run a bash script automatically to process the data.

aria2c --http-user='***' --http-passwd='***' --check-certificate=false --max-concurrent-downloads=2 -M products.meta4 --on-download-complete=/my/path/script_gpt.sh

Focusing on my bash script,

#!/bin/bash

oldEnd=.zip
newEnd=_processed.dim

for i in $(ls -d -1 /my/path/S1*.zip)
do
if [ -f ${i%$oldEnd}$newEnd ]; then 
   echo "Already processed"
else
   gpt /my/path/graph.xml -Pinput1=$i -Poutput1=${i%$oldEnd}$newEnd
fi
done 

Basically, everytime a download is finished, a for loop starts. First it checks if the downloaded product has been already processed and if not it runs a specific task.

My issue is that everytime a download is completed, the bash script is run. This means that if the analysis is not finished from the previous time the bash script was run, both tasks will overlap and eat all my memory resources.

Ideally, I would like to:

  • Each time the bash script is run, check if there is still and ongoing process.

  • If so, wait until it is finished and then run

Its like creating a queu of task (like in a for loop where each iteration waits until the previous one is finished).

I have tried to implement the solutin with wait or identifying the PID but nothing succesfull.

Maybe changing the approach and instead of using aria2 to process the data that is just donwloaded, implemente another solution?

GCGM
  • 901
  • 1
  • 17
  • 38
  • You could try a file lock in the beginning of the script and wait or exit if the file is locked. – Poshi Mar 22 '19 at 10:53
  • Any example on how to implemented? Not familiar with this type of implementation – GCGM Mar 22 '19 at 10:59
  • I think that something similar to this would work: `aria2c --http-user='***' --http-passwd='***' --check-certificate=false --max-concurrent-downloads=2 -M products.meta4 --on-download-complete="flock -x /tmp/aria.lock /my/path/script_gpt.sh"` – Poshi Mar 22 '19 at 11:06
  • Getting the following error: `Could not execute user command: flock -x /tmp/aria.lock /my/path/script_gpt.sh: No such file or directory` – GCGM Mar 22 '19 at 11:23
  • hummm... check if `flock` exists in your system: `which flock` – Poshi Mar 22 '19 at 11:48
  • yes, getting `usr/bin/flock` – GCGM Mar 22 '19 at 11:50
  • Maybe it is trying to execute a sommand without parsing the parameters. Try to encapsulate that in a script so you only need to pass to aria a single command, with no parameters. – Poshi Mar 22 '19 at 12:00
  • not very sure what you mean by that (as context, I do not have a lot of experience programming in `bash`) – GCGM Mar 22 '19 at 12:19
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/190505/discussion-between-poshi-and-gcgm). – Poshi Mar 22 '19 at 13:40
  • See [BashFAQ/045 (How can I ensure that only one instance of a script is running at a time (mutual exclusion, locking)?)](https://mywiki.wooledge.org/BashFAQ/045). – pjh Mar 22 '19 at 19:11
  • "First it checks if the downloaded product has been already processed". So you re-reading the same dir each time? (I'm guessing/assuming). Better to move finished files to a 'DONE' directory. Good luck – shellter Mar 23 '19 at 23:23

1 Answers1

2

You can try to acquire an exclusive file lock and only run when it lock is released. Your code could be like

#!/bin/bash

oldEnd=.zip
newEnd=_processed.dim

{
    flock -e 200

    while IFS= read -r -d'' i
    do
        if [ -f "${i%$oldEnd}$newEnd" ];
        then 
            echo "Already processed"
        else
            gpt /my/path/graph.xml -Pinput1="$i" -Poutput1="${i%$oldEnd}$newEnd"
        fi
    done < <(find /my/path -maxdepth 1 -name "S1*.zip" -print0)
} 200> /tmp/aria.lock

This code opens an exclusive lock against file descriptor 200 (the one we told bash to open to redirect output to the lock file, and prevents other scripts to execute the code block until the file is closed. The file is closed as soon as the code block is finished, allowing other waiting processes to continue the execution.

BTW, you should always quote your variables and you should avoid parsing the ls output. Also, to avoid problems with whitespaces and unexpected globbing, outputting the file list separated by zeros and reading it with read is a way to avoid those problems.

Poshi
  • 5,332
  • 3
  • 15
  • 32