Distributing payload to multiple cron jobs

Question

I have a shell script say data.sh. For this script to execute I will pass a single argument say Table_1.

I have a test file which I will get as a result of a different script.

Now in a test file I have more than 1000 arguments to pass to the script.

The file looks like below:

Table_1
Table_2
Table_3
Table_4
and..so..on

Now I want to execute the script to run in parallel.

I am doing this using cron job.

First I am splitting the test file into 20 parts Using the split command in Linux.

 split -l $(($(wc -l < test )/20 + 1)) test

I will then have the test file divided to 20 parts such as xaa,xab,xac and so on.

Then run the cron job:

* * * * * while IFS=',' read a;do /home/XXXX/data.sh $a;done < /home/xxxx/xaa
* * * * * while IFS=',' read a;do /home/XXXX/data.sh $a;done < /home/xxxx/xab
and so on.

As this involves lot of manual process. I would like to do this dynamically.

Here is what I want to achieve:

1) As soon as I get the test file I would like it to be split into say 20 files automatically and store at a particular place.

2) Then I would like to schedule the cron job for every day 5 Am by passing the 20 files as arguments to the script.

What is the best way to implement this? Any answers with explanation will be appreciated.

score 2 · Accepted Answer · edited May 23 '17 at 12:09

Here is what you could do. Create two cron jobs:

file_splitter.sh -> splits the file and stores them in a particular directory
file_processer.sh -> picks up one file at a time from the directory above, does a read loop, and calls data.sh. Removes the file after successful processing.

Schedule file_splitter.sh to run ahead of file_processor.sh.

If you want to achieve further parallelism, you can make file_splitter.sh write the split files into multiple directories with a few files in each. Let's say they are called sub1, sub2, etc. Then, you can schedule multiple instances of file_processor.sh and pass the sub directory name as an argument. Since the split files are stored in separate directories, we can ensure that only one job processes the files in a particular subdirectory.

It's better to keep the cron command as simple as possible.

* * * * * /path/to/file_processor.sh

is better than

* * * * * while IFS=',' read a;do /home/XXXX/data.sh $a;done < /home/xxxx/xab

Makes sense?

I had written a post about how to manage cron jobs effectively. You may want to take a look at it:

Managing log files created by cron jobs

Distributing payload to multiple cron jobs

1 Answers1