15

I have a standalong script that scrapes a page, initiates a connection to a database, and writes database to it. I need it to execute periodically after x hours. I can make it with using a bash script, with the pseudocode:

while true
do
  python scraper.py
  sleep 60*60*x
done

From what I read about message brokers, they are used for sending "signals" from one running program to another, like HTTP in principle. Like I have a piece of code that accepts an email id from user, it sends signal with email-id to another piece of code that will send the email.

I need celery to run a periodic task on heroku. I already have a mongodb on a separate server. WHy do I need to run another server for rabbitmq or redis just for this? Can I use celery without the broker?

Tejus Prasad
  • 6,322
  • 7
  • 47
  • 75
yayu
  • 7,758
  • 17
  • 54
  • 86

3 Answers3

18

Celery architecture is designed to scale and distribute tasks across several servers. For sites like yours it might be an overkill. Queue service is generally needed to maintain the task list and signal the status of finished tasks.

You might want to take a look in Huey instead. Huey is small-scale Celery "Clone" needing only Redis as an external dependency, not RabbitMQ. It's still using Redis queue mechanism to line the tasks in queue.

There also exists Advanced Python scheduler which does not need even Redis, but can hold the state of the queue in memory in-process.

Alternatively if you have very small amount of periodical tasks, no delayed tasks, I would just use Cron and pure Python scripts to run the tasks.

Mikko Ohtamaa
  • 82,057
  • 50
  • 264
  • 435
3

As the Celery documentation explains:

Celery communicates via messages, usually using a broker to mediate between clients and workers. To initiate a task, a client adds a message to the queue, which the broker then delivers to a worker.

You can use your existing MongoDB database as broker. see Using MongoDB.

Lukas Batteau
  • 2,473
  • 1
  • 24
  • 16
0

For the application like this, its better use Django Background Tasks ,

Installation Install from PyPI:

pip install django-background-tasks

Add to INSTALLED_APPS:

INSTALLED_APPS = (
    # ...
    'background_task',
    # ...
)

Migrate your database:

python manage.py makemigrations background_task
python manage.py migrate

Creating and registering tasks

To register a task use the background decorator:

from background_task import background
from django.contrib.auth.models import User

@background(schedule=60)
def notify_user(user_id):
    # lookup user by id and send them a message
    user = User.objects.get(pk=user_id)
    user.email_user('Here is a notification', 'You have been notified')

This will convert the notify_user into a background task function. When you call it from regular code it will actually create a Task object and stores it in the database. The database then contains serialised information about which function actually needs running later on. This does place limits on the parameters that can be passed when calling the function - they must all be serializable as JSON. Hence why in the example above a user_id is passed rather than a User object.

Calling notify_user as normal will schedule the original function to be run 60 seconds from now:

notify_user(user.id)

This is the default schedule time (as set in the decorator), but it can be overridden:

notify_user(user.id, schedule=90) # 90 seconds from now
notify_user(user.id, schedule=timedelta(minutes=20)) # 20 minutes from now
notify_user(user.id, schedule=timezone.now()) # at a specific time

Also you can run original function right now in synchronous mode:

notify_user.now(user.id)   # launch a notify_user function and wait for it
notify_user = notify_user.now   # revert task function back to normal function. 

Useful for testing. You can specify a verbose name and a creator when scheduling a task:

 notify_user(user.id, verbose_name="Notify user", creator=user)
Yuseferi
  • 7,931
  • 11
  • 67
  • 103