I am in the process of building my first project incorporating scrapy. Everything works well on my development server (windows), but have a few issues on heroku. I am using django-dynamic-scraper which handled allot of the integration work for me.
On windows i run the following commands in separate command prompts:
: scrapy server
: python manage.py celeryd -l info
: python manage.py celerybeat
On heroku I run the following:
: heroku bash >heroku run scrappy server (solves app not found issue)
: heroku run python manage.py celeryd -l info -B --settings=myapp.production
The actual dejango app has no errors or issues and i can access the admin website. scrappy server runs:
: Scrapyd web console available at http://0.0.0.0:6800/
: [Launcher] Scrapyd started: max_proc=16, runner='scrapyd.runner'
: Site starting on 6800
: Starting factory <twisted.web.server.Site instanceat 0x7f1511f62ab8>
and celery beat and worker are working:
: INFO/Beat] beat: Starting...
: INFO/Beat] Writing entries...
: INFO/MainProcess] Connected to django://guest:**@localhost:5672//
: WARNING/MainProcess] celery@081b4100-eb7f-441c-976d-ecf97d2d7e5a ready.
: INFO/Beat] Writing entries...
: INFO/Beat] Writing entries...
FIRST ISSUE: When the periodic task to run the spider is triggered i get the following error in the celery log.
File "/app/.heroku/python/lib/python2.7/site-packages/dynamic_scraper/utils/ta
sk_utils.py", line 31, in _pending_jobs
resp = urllib2.urlopen('http://localhost:6800/listjobs.json?project=default')
...
...
File "/app/.heroku/python/lib/python2.7/urllib2.py", line 1184, in do_open
raise URLError(err)
URLError: <urlopen error [Errno 111] Connection refused>
So it seems that for some reason heroku is not allowing celery to access the scrapy server.
Here are some of my settings:
scrapy.cfg
[settings]
default = myapp.scraper.scrape.settings
[deploy]
#url = http://localhost:6800/
project = myapp
celery config
[config]
app: default:0x7fd4983f6310 (djcelery.loaders.DjangoL
transport: django://guest:**@localhost:5672//
results: database
concurrency: 4 (prefork)
[queues]
celery exchange=celery(direct) key=celery
Thanks in advance and let me know if you need any more info.