0

I have 400 pages to crawl,for example.And some pages may return 3xx or 4xx .I wish when the numbers of bad requests arrived 100,for example. scrapy task auto stop.Thks~

james
  • 643
  • 4
  • 24

1 Answers1

1

You can use different systems:

  • A global variable in the class (which is not recommended but probably is the simplest solution)
  • Storing it in the DB using pipelines

Once you have reached the number that you have configured, you can stop the crawler using:

if errors > maxNumberErrors:
    raise CloseSpider('message error')

or (from this answer)

from scrapy.project import crawler
crawler._signal_shutdown(9,0)
ferran87
  • 635
  • 1
  • 6
  • 17