1

Let's say I have:

  • a list of 3 pairs (login, password) and I intend to create one urllib2 opener for each pair
  • one task in Celery
  • concurrency = 3

I want to tie each opener to a Task instance (so each task has it's own opener, ie. with different auth cookies).

What I currently do is subclass from Task:

    class TaskWithOpener(Task):
        abstract = True
        _openers = None

        @property
        def openers(self):
            if self._openers is None:
                print 'creating openers for', self
                (...)
            print 'openers already created for ', self, ' just returning them'
            return self._openers

and make task like this:

    @my_celery.task(rate_limit='5/m', base=TaskWithOpener)
    def my_task():
        opener = random.choice(my_task.openers)

But this way each task has list of multiple openers and they are created for each thread separately so when there are 3 credential pairs (login, password) and concurrency = 3, my program creates 9 openers which is unacceptable.

Matt
  • 163
  • 3
  • 13

1 Answers1

1

This is perfectly valid behavior of Celery. You've basically created a class that for each instance creates three openers and instantiated it three times.

What you're trying to do is to spawn three tasks, each with its own set of credentials:

@celery.task(rate_limit='5/m')
def the_task(login, password):
    opener = create_opener(login, password)
    …

Then you can call it like:

credentials = [
    ('login1', 'password1'),
    ('login2', 'password2'),
    ('login3', 'password3'),
]

for login, password in credentials:
    the_task.delay(login, password)

That way the worker will receive three tasks and apply rate limit to it.

Update:

From your comment and the code I suspect you want to make options a class attribute.

The problem is overwriting the attribute on self makes it an instance attribute.

I think you're trying to create a class property.

Honestly, I don't think this is a good solution. I would like to know why you don't want to create the opener each time.

Is this costly? Then what you're looking for is not a task queue + worker but some server running constantly (may be implemented over twisted).

Community
  • 1
  • 1
Krzysztof Szularz
  • 5,151
  • 24
  • 35
  • Thank you. But what if I send tasks to the queue using send_task from the web interface and the workers should be independent? Would it even work if openers are created on a separate machine? Even if so, I cannot create openers for each web request, not to mention each task execution as you propose. I need them to be created exactly once for each (login, password) and then reused. How would you handle this, Krzysiek? – Matt Nov 20 '13 at 11:57
  • See my update. Please update your question with the reason why you need the shared "opener"? Does it share some state between requests? Furthermore, did you see https://github.com/kennethreitz/grequests? It seems like it can solve your problem. – Krzysztof Szularz Nov 20 '13 at 16:20
  • Class properties behaves the same way - perhaps because I'm testing this on Windows? I need shared "openers" because creating them means to go to a website with given proxy (not mentioned earlier), log in with given credentials and save the auth cookie. And doing it many times causes not only unnecessary load on the servers but also bans. I'm open to every suggestion but it seems that I need to run tasks with objects that are prepared earlier (in different words: running tasks in a given context). I don't really care if they are tasks' properties, so my question may be poorly formulated. – Matt Nov 20 '13 at 21:20
  • grequests are cool and I'll use it in some small scripts - but at this time fetching websites is only part of the story (there is also parsing, saving back to db etc.) – Matt Nov 20 '13 at 21:21
  • Are you aware of cookie jars? http://docs.python.org/2/library/cookielib.html They let you store authorization data in some place, so all tasks can share them. – Krzysztof Szularz Nov 21 '13 at 11:05
  • To make this a class attribute you can simply use: `self.__class__._openers` instead of `self._openers`. This way subclass will reuse the same openers. A random choice of openers seems silly if you are using the prefork pool since only one can be used at the same time anyway. – asksol Nov 21 '13 at 14:40
  • @asksol: This basically answers my question. Krzysztof: yes. But I thought it wasn't a good idea to log in on one machine and use it on another (but, since I'm using proxies anyway, this will be the best solution). – Matt Nov 23 '13 at 12:04