4

I have multiple spider files in same project, each one is for different domains and using same pipelines and settings.

My basic need is to automate the scrapy project via scheduler like cron (I am using windows machine).

The scrapy project needs to run once per day. The results pipeline will save the data to Mysql.

Can anyone please suggest the appropriate way to fulfill my requirement.

Tom Zych
  • 13,329
  • 9
  • 36
  • 53
Sabeena
  • 85
  • 12
  • search here for `[bash] cron` or `crontab`. Good luck. – shellter Nov 16 '15 at 16:10
  • The simplest way I can figure is to create a batch file which would invoke all the spiders (one line per spider) and run it with the Windows Task Scheduler. If you additionally want to run all the spiders with a single Python script, there are different approachs you can find by searching in Stackoverflow. – mcubik Nov 16 '15 at 18:52

1 Answers1

6

For running multiple spider you can try this code:

from scrapy.crawler import CrawlerProcess
from scrapy.utils.project import get_project_settings

process = CrawlerProcess(get_project_settings())

process.crawl('testspider1', domain='domain1.com')
process.crawl('testspider2', domain='domain2.com')
process.crawl('testspider3', domain='domain3.com')
process.start()
Rahul
  • 3,208
  • 8
  • 38
  • 68
  • Thanks for your response. can you elaborate ? where do we need to put this script ? inside scrapy project ? and how to run this script via automated scheduler cron? – Sabeena Nov 17 '15 at 07:33
  • Just save this file with any name you wish at the location where your `settings.py` file resides. If you want to run the spider just execute this file as you run other python scripts like `python run.py`. For running this with crontab refer this [tutorial](http://www.adminschoice.com/crontab-quick-reference) . Write the code and if you face any difficulties comment here. – Rahul Nov 17 '15 at 12:58
  • the script you provided is working. i am using windows machine so cron is not the correct one i guess, can you suggest windows automated scheduler? – Sabeena Nov 18 '15 at 11:48
  • I don't have access to a windows machine right now. I remember there was a Task Sceduler in windows which worked just like cron. Maybe this [tutorial](http://windows.microsoft.com/en-us/windows/schedule-task#1TC=windows-7) will help. Please refer to this [question](http://stackoverflow.com/questions/7195503/setting-up-a-cron-job-in-windows) too. – Rahul Nov 18 '15 at 12:00
  • thanks for the help. if you have time can you please look into [this SO question](http://stackoverflow.com/q/33825930/5355609?stw=2), regarding scrapy logging – Sabeena Nov 20 '15 at 11:48
  • hi @Rahul can you look into that other query – Sabeena Nov 20 '15 at 14:53