4

I have an inotify shell script which monitors a directory, and executes certain commands if a new file comes in. I need to make this inotify script into a parallelized process, so the execution of the script doesn't wait for the process to complete whenever multiple files comes into the directory.

I have tried using nohup, & and xargs to achieve this task. But the problem was, xargs runs the same script as a number of processes, whenever a new file comes in, all the running n processes try to process the script. But essentially I only want one of the processes to process the new file whichever is idle. Something like worker pool, whichever worker is free or idle tries to execute the task.

This is my shell script.

#!/bin/bash
# script.sh
inotifywait --monitor -r -e close_write --format '%w%f' ./ | while read FILE

do
  echo "started script";
  sleep $(( $RANDOM % 10 ))s;
  #some more process which takes time when a new file comes in
done

I did try to execute the script like this with xargs => xargs -n1 -P3 bash sample.sh

So whenever a new file comes in, it is getting processed thrice because of P3, but ideally i want one of the processes to pick this task which ever is idle.

Please shed some light on how to approach this problem?

tripleee
  • 175,061
  • 34
  • 275
  • 318
Beeti Sushruth
  • 321
  • 2
  • 12
  • The shell does not provide access to threading at all. I edited your question to rephrase the problem statement without incorrect assumptions about the processing model. – tripleee Oct 01 '19 at 05:27
  • @tripleee thanks, I didn't know shell didn't have multi threading, but glad you understood the gist of the question. – Beeti Sushruth Oct 01 '19 at 06:15

2 Answers2

2

There is no reason to have a pool of idle processes. Just run one per new file when you see new files appear.

#!/bin/bash
inotifywait --monitor -r -e close_write --format '%w%f' ./ |
while read -r file
do
  echo "started script";
  ( sleep $(( $RANDOM % 10 ))s
  #some more process which takes time when a new "$file" comes in
  )  &
done

Notice the addition of & and the parentheses to group the sleep and the subsequent processing into a single subshell which we can then background.

Also, notice how we always prefer read -r and Correct Bash and shell script variable capitalization

tripleee
  • 175,061
  • 34
  • 275
  • 318
  • 1
    so essentially I'm creating a process for every new file. Is there a way I can restrict the number of process created? For example, if 200 new files come at the same time, I don't want to create 200 processes. Maybe 5 processes, and the rest of the files have to wait till the 5 processing finish. And is this not cpu intensive creating too many processes? Thanks. – Beeti Sushruth Oct 01 '19 at 06:25
  • https://stackoverflow.com/questions/1537956/bash-limit-the-number-of-concurrent-jobs or maybe if the process can handle more than a single file at a time, schedule them in batches of, say, 20 tops. (You might still want to limit the number of concurrent jobs.) – tripleee Oct 01 '19 at 06:52
1

Maybe this will work:

https://www.gnu.org/software/parallel/man.html#EXAMPLE:-GNU-Parallel-as-dir-processor

If you have a dir in which users drop files that needs to be processed you can do this on GNU/Linux (If you know what inotifywait is called on other platforms file a bug report):

inotifywait -qmre MOVED_TO -e CLOSE_WRITE --format %w%f my_dir |
  parallel -u echo

This will run the command echo on each file put into my_dir or subdirs of my_dir.

To run at most 5 processes use -j5.

Community
  • 1
  • 1
Ole Tange
  • 31,768
  • 5
  • 86
  • 104