0

I've started a new Python 3 project in which my goal is to download tweets and analyze them. As I'll be downloading tweets from different subjects, I want to have a pool of workers that must download from Twitter status with the given keywords and store them in a database. I name this workers fetchers.

Other kind of worker is the analyzers whose function is to analyze tweets contents and extract information from them, storing the result in a database also. As I'll be analyzing a lot of tweets, would be a good idea to have a pool of this kind of workers too.

I've been thinking in using RabbitMQ and Celery for this but I have some questions:

  • General question: Is really a good approach to solve this problem?
  • I need at least one fetcher worker per downloading task and this could be running for a whole year (actually is a 15 minutes cycle that repeats and last for a year). Is it appropriate to define an "infinite" task?
  • I've been trying Celery and I used delay to launch some example tasks. The think is that I don't want to call ready() method constantly to check if the task is completed. Is it possible to define a callback? I'm not talking about a celery task callback, just a function defined by myself. I've been searching for this and I don't find anything.
  • I want to have a single RabbitMQ + Celery server with workers in different networks. Is it possible to define remote workers?
David Moreno García
  • 4,423
  • 8
  • 49
  • 82

1 Answers1

1
  1. Yeah, it looks like a good approach to me.

  2. There is no such thing as infinite task. You might reschedule a task it to run once in a while. Celery has periodic tasks, so you can schedule a task so that it runs at particular times. You don't necessarily need celery for this. You can also use a cron job if you want.

  3. You can call a function once a task is successfully completed.

from celery.signals import task_success

 @task_success(sender='task_i_am_waiting_to_complete')
 def call_me_when_my_task_is_done():
     pass
  1. Yes, you can have remote workes on different networks.
Chillar Anand
  • 27,936
  • 9
  • 119
  • 136
  • 2. It is not exactly infinite. I call the task and after 15 minutes the task should stop and execute a secondary task that after finish call the first task again. This until I stop it (maybe not a real worker here. I could use flask for this and create a simple API). 3. So I need celery in the worker and in the clients that call the worker tasks. Perfect. I had a lot of troubles finding this. 4. Where can I find some info about this? I don't find anything. – David Moreno García Oct 31 '14 at 08:08