Scrapy: getting HTTP status code is not handled or not allowed only when crawling

Question

I am getting the above mentioned error when trying to crawl a website. There are many posts on SO with similar issue, most notably this one: Scrapy: HTTP status code is not handled or not allowed? where it is suggested to change the user agent to prevent this error. However, my issue is a bit different. I did change the user agent and I am still unable to run scrapy crawl spidername command, but I am able to run scrapy shell "website.com" without an issue and I am even able to get the response from the website inside the shell and parse the html. The error only happens when I try to run crawl command.

What could be the issue? Here is my error message:

I am even able to run spider object from inside the shell without any errors.

score 1 · Accepted Answer · answered Mar 29 '21 at 13:54

1

This might sound strange but remove the trailing slash from the url and it works

Use this https://www.cigabuy.com/consumer-electroincs-c-56_75.html

and not this https://www.cigabuy.com/consumer-electroincs-c-56_75.html/

answered Mar 29 '21 at 13:54

Sagun Shrestha

1,188
10
23

wow, that was the issue. I was 100% confident there werent any issues with the spider object or parse() method itself since I could get it working from inside the shell. Thanks. – Ach113 Mar 29 '21 at 14:13

Scrapy: getting HTTP status code is not handled or not allowed only when crawling

1 Answers1