1

I am getting the above mentioned error when trying to crawl a website. There are many posts on SO with similar issue, most notably this one: Scrapy: HTTP status code is not handled or not allowed? where it is suggested to change the user agent to prevent this error. However, my issue is a bit different. I did change the user agent and I am still unable to run scrapy crawl spidername command, but I am able to run scrapy shell "website.com" without an issue and I am even able to get the response from the website inside the shell and parse the html. The error only happens when I try to run crawl command.

What could be the issue? Here is my error message:

enter image description here

I am even able to run spider object from inside the shell without any errors. enter image description here

Ach113
  • 1,775
  • 3
  • 18
  • 40

1 Answers1

1

This might sound strange but remove the trailing slash from the url and it works

Use this https://www.cigabuy.com/consumer-electroincs-c-56_75.html

and not this https://www.cigabuy.com/consumer-electroincs-c-56_75.html/

Sagun Shrestha
  • 1,188
  • 10
  • 23
  • wow, that was the issue. I was 100% confident there werent any issues with the spider object or parse() method itself since I could get it working from inside the shell. Thanks. – Ach113 Mar 29 '21 at 14:13