1

I have a system where a cron job is used to drive a manage.py command every minute.

The trouble is, the job can sometimes take longer than a minute, and it's not safe for two instances of the command to run at once.

Is there a good way to make the command detect if another instance of itself is already running and exit early? Is there a better way to achieve the same end?

ctford
  • 7,189
  • 4
  • 34
  • 51
  • 1
    See [Python - Single instance of program](http://stackoverflow.com/questions/380870/python-single-instance-of-program) – Lukas Graf Nov 12 '13 at 07:30

4 Answers4

1

You could also use e.g. django-cronjobs (disclaimer: not used it myself) to register a job. From the docs:

# myapp/cron.py
import cronjobs

@cronjobs.register
def periodic_task():
    pass

And then use:

$ ./manage.py cron periodic_task

What's more: django-cronjob even, by default, makes sure only one copy of a job runs at the same time.

Mark van Lent
  • 12,641
  • 4
  • 30
  • 52
0

What you could do is have the command create a file at the start of the command, before performing it's task (containing the pid of the job) and then clean up that file at the end of the command.

When the command is run, it should first check if that pidfile exists. If so it should not perform it's job.

So:

  1. Check if pidfile exists? If so exit.
  2. Create the pidfile
  3. Perform the job
  4. Clean up the pidfile

It's not perfect (e.g. if the command does not finish properly, the pidfile is not removed and the command never runs again), but it might be good enough for your situation.

Also see the question: What are pid and lock files for?

Community
  • 1
  • 1
Mark van Lent
  • 12,641
  • 4
  • 30
  • 52
0

You can use a cronjob library that takes care of locking jobs to prevent multiple executions - Preventing multiple executions

As an alternative you can use celerybeat instead of cron to control your jobs. Celerybeat comes with more overhead but if you already use celery as part of your application, this shouldn't be too hard. This lists some advantages to celerybeat What are the advantages of celerybeat over cron?

You have to persist state somewhere to indicate that a job is already running. The pid technique is fine but an alternative is to use a Semaphore by implementing it either at the cache level (Memcache/Redis) or directly in the database. This is especially useful when there might not be a consistent file system available to manage pid files. Eg. You are running your app on Heroku.

Also ideally, try to make your cron jobs idempotent if you can i.e, even if the job runs multiple times in parallel, there are no side-effects to that.

Community
  • 1
  • 1
Pratik Mandrekar
  • 9,362
  • 4
  • 45
  • 65
0

I have been using lockfile for that matter, and it work quite well.

Basic usage:

from lockfile import FileLock, AlreadyLocked, LockTimeout
lock = FileLock(lock_name)
try:
    lock.acquire(LOCK_WAIT_TIMEOUT)
except AlreadyLocked:
    logging.debug("lock already in place. quitting.")
    return
except LockTimeout:
    logging.debug("waiting for the lock timed out. quitting.")
    return
logging.debug("acquired.")

# do stuff...

logging.debug("releasing lock...")
lock.release()
augustomen
  • 8,977
  • 3
  • 43
  • 63