6

What's the better way to organize Django + Scrapy? My goal is to use Django to create scrapy tasks and have scrapy populate Django database.

I've previously have created a scrapy and django project that both sat in the root directory, but I had a lot of PATH and os.environ problems where I had scripts to set my PATH to get things to run. I want to avoid that.

I've seen two solutions:

  1. Start a Django project, and create a scrapy sub-project using SCRAPY_SETTINGS_MODULE environ (See Mikhail's answer here Access django models inside of Scrapy)
  2. Start a Scrapy project, use DjangoItem, and use DJANGO_SETTINGS_MODULE environ

What are the pros and cons of each solution?

Community
  • 1
  • 1
Lionel
  • 3,188
  • 5
  • 27
  • 40
  • Do you plan to fire up the scrapy tasks from a django view?. That's usually not a good idea because scrapy tasks can take a long time, and django views are limited in lifetime. You should decouple the crawler from the web UI and use a queue for scheduling scrapy tasks. – Pablo Hoffman Feb 19 '12 at 06:51
  • Hi Pablo, for the queue management, I plan to use either django-tasks. Since I'm using django tasks, I have to be careful about the PATH and have to write a script that runs Scrapy – Lionel Feb 21 '12 at 17:53

0 Answers0