How to make multiple Scrapy Spiders periodic and dynamic?

Question

I have some Scrapy spiders which can crawl news from some newspapers. But currently I run them manually using this command -

scrapy crawl SpiderName

I am crawling news from 20 different news papers. I have 20 different spider classes to get this job done. So I have to run this command for 20 times when I want to crawl the latest news. I want it to be automatic. I want to write a script which can continuously crawl all these newspapers after a definite period of time. I have tried to do this using a infinite while loop. But this didn't work well. Is there any other standard way to do this?

Heroku has a scheduler adon which you can use to make cron jobs https://devcenter.heroku.com/articles/scheduler — Eternal, Mar 05 '20 at 09:37
Does this answer your question? [Running Multiple spiders in scrapy for 1 website in parallel?](https://stackoverflow.com/questions/39365131/running-multiple-spiders-in-scrapy-for-1-website-in-parallel) — parik, Mar 05 '20 at 14:24

score 1 · Answer 1 · answered Mar 05 '20 at 09:20

1

You can achieve this with scrapy-do

1.install

pip install scrapy-do

2.schedule

  scrapy-do-cl schedule-job --project quotesbot \
        --spider toscrape-css --when 'every 5 to 15 minutes'

answered Mar 05 '20 at 09:20

Bogdan Veliscu

641
6
11

1

I don't understand the schedule part. Where to run this command? Inside the spider folder? Does it crawl all the newspapers at a time? How to modify this command? – Protik Nag Mar 05 '20 at 09:35

How to make multiple Scrapy Spiders periodic and dynamic?

1 Answers1