0

How can I run a backend in django that is saving objects in the database while the page is running?

Example a scraper is running in the back indefinitely, and updating the models with articles (title, summary, url) The page is showing those that are already in the database, and possibly a number of pages scraped in the meantime (like on twitter) which you could load.

One way would be to write it in python-mysql script which update the table directly. But is there a way of accessing the django-models interface? In other words what's the django way of doing this?

Oliver
  • 1,785
  • 3
  • 13
  • 18

3 Answers3

1

Also, if you want something a little simpler than Celery + Rabbit MQ, and don't want to muck around in cron too much, the django-extensions app has a pretty slick Jobs feature (http://packages.python.org/django-extensions/jobs_scheduling.html). It only does Daily / Hourly / Weekly / Monthly jobs, but you only ever have to edit your crontab once.

Chris Lawlor
  • 47,306
  • 11
  • 48
  • 68
1

The straightforward answer is that you can't really do it with django as-it-is, from an http request, because it does not support background execution nor websockets out of the box.

You can actually find quite a lot of related answers on stackoverflow, too many to mention them, but few really put together an answer. Basically you should be able to achieve what you want using:

a more generic answer including several alternatives might likely be out of the scope of SO, but if you get started and have some more precise issues let us know.

Edit: of course an alternative to a fully managed celery system is the good old cron (as you suggest) + a custom ./manage.py command which allows you to use django models, as @DTing suggests!

Stefano
  • 18,083
  • 13
  • 64
  • 79
  • @Oliver: since then, and mostly out of curiosity, I've done more research myself on the 'background jobs' which you can find in this http://stackoverflow.com/questions/8068945/django-long-running-asynchronous-tasks-with-threads-processing question. I did not edit my answer in that respect because using Celery, as you can read, is still the only suggested method for asynchronous tasks, but if you are interested in the hard core alternatives and their limitations, there you go! – Stefano Nov 29 '11 at 18:09
1

you can write a custom management command docs and setup a cron job to execute it at the desired interval.

ajax can be used to load the data to pages that are already open, new requests should correctly pull the updated data from the db.

dting
  • 38,604
  • 10
  • 95
  • 114