i am trying to run a cron job which will execute my shell script, my shell script is having hive & pig scripts. I am setting the cron job to execute after every 2 mins but before my shell script is getting finish my cron job starts again is it going to effect my result or once the script finishes its execution then only it will start. I am in a bit of dilemma here. Please help. Thanks
-
You want your cron job to start only after your previous execution of the same script is over? – Fazlin Jul 07 '16 at 14:26
-
Create a lock file for the execution. Based on your requirements, you can either simply ignore the execution if previous has not yet finished (I personally like this) or just wait for the lock to be released. – satish Jul 07 '16 at 14:28
-
@Fazlin yes, i want that flow – Ironman Jul 07 '16 at 14:28
-
@satish can you please provide any example that would help me to understand better. – Ironman Jul 07 '16 at 14:29
-
http://stackoverflow.com/questions/2366693/run-cron-job-only-if-it-isnt-already-running – sandeep rawat Jul 07 '16 at 14:45
-
you can use flock too.. – sandeep rawat Jul 07 '16 at 14:47
2 Answers
I think there are two ways to better resolve this, a long way and a short way:
Long way (probably most correct):
Use something like Luigi to manage job dependencies, then run that with Cron (it won't run more than one of the same job).
Luigi will handle all your job dependencies for you and you can make sure that a particular job only executes once. It's a little more work to get set-up, but it's really worth it.
Short Way:
Lock files have already been mentioned, but you can do this on HDFS too, that way it doesn't depend on where you run the cron job from.
Instead of checking for a lock file, put a flag on HDFS when you start and finish the job, and have this as a standard thing in all of your cron jobs:
# at start
hadoop fs -touchz /jobs/job1/2016-07-01/_STARTED
# at finish
hadoop fs -touchz /jobs/job1/2016-07-01/_COMPLETED
# Then check them (pseudocode):
if(!started && !completed): run_job; add_completed; remove_started

- 8,144
- 7
- 49
- 79
-
Just throwing one more option which is similar to Luigi called "Airflow". Going with Luigi or Airflow is probably better and efficient way to do this. Using Airflow (Luigi also), you mark the job (dag) as depends on past completion to true. – satish Jul 07 '16 at 15:24
-
-
I've never used Airflow, so can't comment. Just don't use Oozie. Also worth noting -- the convention is for MapReduce jobs to drop a _COMPLETED flag in the destination directory when they finish. This behavior might be missing from hive (it was missing in earlier versions), but that's the default flag that Luigi checks for completion and I'd recommend sticking to this convention too. – Matthew Rathbone Jul 08 '16 at 16:21
-
Worth noting - typically files that begin with an underscore are ignored by hive/mapreduce/pig scripts from an input perspective, so you can use them to your hearts content. – Matthew Rathbone Jul 08 '16 at 16:22
At the start of the script, have a check:
#!/bin/bash
if [ -e /tmp/file.lock ]; then
rm /tmp/file.lock # removes the lock and continue
else
exit # No lock file exists, which means prev execution has not completed.
fi
.... # Your script here
touch /tmp/file.lock
There are many others ways of achieving the same. I am giving a simple example.

- 2,285
- 17
- 29
-
what is there in file.lock, where to write this check within the shell script ? – Ironman Jul 07 '16 at 14:41
-
file.lock is an empty file which i create using `touch` at the last line of the script. The `if` condition in my example should be first before your implementation. – Fazlin Jul 07 '16 at 14:48
-
-
I tried your way but it is getting exited and my shell script is not getting started, may be my previous execution is not yet finished. How can i finish my previous execution ? – Ironman Jul 08 '16 at 09:38
-
I got my problem actually i didn't created a file.lock previously as a result it was executing the else part. – Ironman Jul 08 '16 at 09:53