I have the following setup (Docker):
- Celery linked to Flask setup which runs the Scrapy spider
- Flask setup (obviously)
- Flask setup gets request for Scrapy -> fire up worker to do some work
Now I wish to update the original flask setup on the progress of the celery worker. BUT there is no way right now to use celery.update_state()
inside of the scraper as it has no access to the original task (though it is being run inside of the celery task).
As an aside: am i missing something about the structure of scrapy? It would seem reasonable that I can assign arguments inside of __init__
to be able to use furtheron, but scrapy uses the method as lambda functions it seems..
To answer some questions:
How are you using celery with scrapy?
Scrapy is running inside of a celery task, not run from the command line. I also have never heard ofscrapyd
, is this a subproject of scrapy? I use a remote worker to fire off scrapy from inside of a celery/flask instance, so it is not the same as the thread being intanced by the original request, they are seperate docker instances.
The task.update_state
works great! inside of the celery task, but as soon as we are 'in' the spider, we no longer have access to celery. Any ideas?
From the item_scraped signal issue Task.update_state(taskid,meta={}). You can also run without the taskid if scrapy happens to be running in a Celery task itself (as it defaults to self)
Is this sort of like a static way of accessing the current celery task? As I would love that....