1

This seems like a simple question, but I am having trouble finding the answer.

I am making a web app which would require the constant running of a task.

I'll use sites like Pingdom or Twitterfeed as an analogy. As you may know, Pingdom checks uptime, so is constantly checking websites to see if they are up and Twitterfeed checks RSS feeds to see if they;ve changed and then tweet that. I too need to run a simple script to cycle through URLs in a database and perform an action on them.

My question is: how should I implement this? I am familiar with cron, currently using it to do my server backups. Would this be the way to go?

I know how to make a Python script which runs indefinitely, starting back at the beginning with the next URL in the database when I'm done. Should I just run that on the server? How will I know it is always running and doesn't crash or something?

I hope this question makes sense and I hope I am not repeating someone else or anything.

Thank you,

Sam

Edit: To be clear, I need the task to run constantly. As in, check URL 1 in the database, check URl 2 in the database, check URL 3 and, when it reaches the last one, go right back to the beginning. Thanks!

samiles
  • 3,768
  • 12
  • 44
  • 71

2 Answers2

1

If you need a repeatable running of the task which can be run from command line - that's what the cron is ideal for. I don't see any demerits of this approach.

Update: Okay, I saw the issue somewhat different. Now I see several solutions:

  • run the cron task at set intervals, let it process the data once per run, next time it will process the data on another run; use PIDs/Database/semaphores to avoid parallel processes;
  • update the processes that insert/update data in the database; let the information be processed when it is inserted/updated; c)
  • write a demon process which will reside in memory and check the data in real time.
Minras
  • 4,136
  • 4
  • 18
  • 18
  • But how would I use cron to run a task constantly? Is cron not for running a task at set intervals? Or, do you mean to star the task every hour and have it kill itself after running for an hour? Thanks. – samiles Dec 21 '11 at 15:12
  • My comment was too big so I updated my initial reply. Please look at it. – Minras Dec 21 '11 at 15:26
1

cron would definitely be a way to go with this, as well as any other task scheduler you may prefer.

The main point is found in the title to your question:

Run a repeating task for a web app

The background task and the web application should be kept separate. They can share code, they can share access to a database, but they should be separate and discrete application contexts. (Consider them as separate UIs accessing the same back-end logic.)

The main reason for this is because web applications and background processes are architecturally very different and aren't meant to be mixed. Consider the structure of a web application being held within a web server (Apache, IIS, etc.). When is the application "running"? When it is "on"? It's not really a running task. It's a service waiting for input (requests) to handle and generate output (responses) and then go back to waiting.

Web applications are for responding to requests. Scheduled tasks or daemon jobs are for running repeated processes in the background. Keeping the two separate will make your management of the two a lot easier.

David
  • 208,112
  • 36
  • 198
  • 279
  • Thank you. In my title, I really just meant that the web app that the user sees relies on the data generated by the background task, just to give it some context. Perhaps I worded it badly :) – samiles Dec 21 '11 at 15:11
  • @samiles: No biggie, hopefully this post will help someone who stumbles across the question on a Google search someday. In response to your comment on Minras' answer, this thread might help: http://stackoverflow.com/questions/473620/how-do-you-create-a-daemon-in-python – David Dec 21 '11 at 15:15