Appropriate method to define a pool of workers in Python

Question

I've started a new Python 3 project in which my goal is to download tweets and analyze them. As I'll be downloading tweets from different subjects, I want to have a pool of workers that must download from Twitter status with the given keywords and store them in a database. I name this workers fetchers.

Other kind of worker is the analyzers whose function is to analyze tweets contents and extract information from them, storing the result in a database also. As I'll be analyzing a lot of tweets, would be a good idea to have a pool of this kind of workers too.

I've been thinking in using RabbitMQ and Celery for this but I have some questions:

General question: Is really a good approach to solve this problem?
I need at least one fetcher worker per downloading task and this could be running for a whole year (actually is a 15 minutes cycle that repeats and last for a year). Is it appropriate to define an "infinite" task?
I've been trying Celery and I used delay to launch some example tasks. The think is that I don't want to call ready() method constantly to check if the task is completed. Is it possible to define a callback? I'm not talking about a celery task callback, just a function defined by myself. I've been searching for this and I don't find anything.
I want to have a single RabbitMQ + Celery server with workers in different networks. Is it possible to define remote workers?

I'm sorry my bad. I was too hasty. – Nima Mousavi Oct 30 '14 at 20:55 — Nima Mousavi, Oct 30 '14 at 20:55

score 1 · Answer 1 · answered Oct 31 '14 at 07:18

1

Yeah, it looks like a good approach to me.
There is no such thing as infinite task. You might reschedule a task it to run once in a while. Celery has periodic tasks, so you can schedule a task so that it runs at particular times. You don't necessarily need celery for this. You can also use a cron job if you want.
You can call a function once a task is successfully completed.

from celery.signals import task_success

 @task_success(sender='task_i_am_waiting_to_complete')
 def call_me_when_my_task_is_done():
     pass

Yes, you can have remote workes on different networks.

answered Oct 31 '14 at 07:18

Chillar Anand

27,936
9
119
136

2. It is not exactly infinite. I call the task and after 15 minutes the task should stop and execute a secondary task that after finish call the first task again. This until I stop it (maybe not a real worker here. I could use flask for this and create a simple API). 3. So I need celery in the worker and in the clients that call the worker tasks. Perfect. I had a lot of troubles finding this. 4. Where can I find some info about this? I don't find anything. – David Moreno García Oct 31 '14 at 08:08

Appropriate method to define a pool of workers in Python

1 Answers1