Scrapy - 301 redirect in shell

Question

I can not find a solution to the following problem. I am using Scrapy (latest version) and am trying to debug a spider. Using scrapy shell https://jigsaw.w3.org/HTTP/300/301.html -> it does not follow the redirect ( it is using a default spider to get the data). If I am running my spider it follows the 301 - but I can not debug.

How can you make the shell to follow the 301 to allow one to debug the final page?

score 10 · Accepted Answer · answered Jul 31 '16 at 11:18

10

Scrapy uses Redirect Middleware for redirects, however it's not enabled in shell. Quick fix for this:

scrapy shell "https://jigsaw.w3.org/HTTP/300/301.html"
fetch(response.headers['Location'])

Also to debug your spider you probably want to inspect the response your spider is receiving:

from scrapy.shell import inspect_response
def parse(self, response)
    inspect_response(response, self)
    # the spider will stop here and open up an interactive shell during the run

answered Jul 31 '16 at 11:18

Granitosaurus

20,530
5
57
82

thanks! This seems to be a quick fix which allows me to continue! – Pixelartist Jul 31 '16 at 11:23
@Pixelartist no problem, see my edit for more info regarding debugging spiders properly. – Granitosaurus Jul 31 '16 at 11:23
I think the additional edit is kind of the full solution. I was hoping that you can configure the shell behavior but with this - it solves it. – Pixelartist Jul 31 '16 at 13:30

Scrapy - 301 redirect in shell

1 Answers1