2

I have a small python script that creates a graph of data pulled from MySQL. I'm trying to figure out a way to run the script in the background all time on a regular basis. I've tried a number of things:

  1. A Cron Job that runs the script
  2. A loop timer
  3. Using the & command to run the script in the background

These all have there pluses and minuses:

  1. The Cron Job running more then every half hour seems to eat up more resources then it's worth.
  2. The Loop timer put into the script doesn't actually put the script in the background it just keeps it running.
  3. The Linux & command backgrounds the process but unlike a real Linux service I can't restart/stop it without killing it.

Can someone point me to a way to get the best out of all of these methods?

Brigand
  • 84,529
  • 20
  • 165
  • 173
user1441079
  • 33
  • 1
  • 8
  • 1
    #1 doesn't make any sense. What resources? Running from cron doesn't magically make your app consume more resources than usual. – Cat Plus Plus Jun 07 '12 at 00:59
  • I need to run the script every minute and cron seems to spike when it runs, and those spikes every minute or so seem to be slowing things down. Also cron never leaves the task list after the script runs so each time it does run it uses more and more memory. – user1441079 Jun 07 '12 at 01:05
  • Not to be insulting, but you're probably doing it wrong or interpreting some data wrong-- can you please post the specifics of what you are doing and seeing (cron line, ps output that makes you think it's spiking etc etc)? Cron has been around for a long time and is pretty stable, it seems unlikely cron *itself* is causing issues. And cron is *supposed* to always remain running btw -- that is how it can start jobs at arbitrary times. – Domingo Ignacio Jun 07 '12 at 01:11

2 Answers2

11

Why don't you try to make your script into a proper daemon. This link is a good place to start.

import os
import subprocess
import time
from daemon import runner

class App():
    def __init__(self):
        self.stdin_path = '/dev/null'
        self.stdout_path = '/dev/tty'
        self.stderr_path = '/dev/tty'
        self.pidfile_path =  '/tmp/your-pid-name.pid'
        self.pidfile_timeout = 5
    def run(self):

        try:
            while True:

                ### PUT YOUR SCRIPT HERE ###

                time.sleep(300)

        except Exception, e:
            raise

app = App()
daemon_runner = runner.DaemonRunner(app)
daemon_runner.do_action()

You can start/stop/restart this script just like any other linux service.

Community
  • 1
  • 1
secumind
  • 1,141
  • 1
  • 17
  • 38
3

The cron job is probably a good approach in general, as the shell approach requires manual intervention to start it.

A couple of suggestions:

You could use a lock file to ensure that the cron job only ever starts one instance of the python script - often problems occur when using cron for larger jobs because it starts a second instance before the first instance has actually finished. You can do this simply by checking whether the lock file exists, then, if it does not, 'touch'ing the file at the beginning of the script and 'rm'ing it as your last action at the end of the script. If the lock file exists -- simply exit the script, as there is already one instance running. (Of course, if the script dies you will have to delete the lock file before running the script again).

Also, if excessive resource use is a problem, you can ensure that the script does not eat too many resources by giving it a low priority (prefix with, for example, nice -n 19).

Soz
  • 957
  • 1
  • 5
  • 9
  • As you said the cron job is good in general but now the op said it has to run every minute which means the job will have to be on-memory most of the time. So I would take the daemon approach.To pull data from MySQL every minute doesn't sound right though. – Kenji Noguchi Jun 07 '12 at 01:39
  • 1
    Yes, I agree that if it has to run every minute the daemon approach would be worth pursuing. However, the symptoms described by the OP (each time it runs it uses more memory etc.) seem to suggest, albeit indirectly, that the script takes a considerable time to complete. Might be worth benchmarking it and figuring out where the bottlenecks lie, then figuring out what a realistic schedule might be given the required execution time/resources. – Soz Jun 07 '12 at 02:00
  • 1
    I agree with @Soz comment, you should look into seeing where any potential bottlenecks might be in your querying script. Also benchmark it and see if it takes more then a minute to complete. If so then you cron job or daemon loop should not be every minute. Also if you are graphing the data in python based on a MySQL query, and this data changes continuously, you may want to consider keeping it in an array, that way your mysql queries can simply be for records that have changed since the last timestamp in the array. Those new values can be added to the array and the data graphed. – secumind Jun 07 '12 at 02:34