4

I can not find a solution to the following problem. I am using Scrapy (latest version) and am trying to debug a spider. Using scrapy shell https://jigsaw.w3.org/HTTP/300/301.html -> it does not follow the redirect ( it is using a default spider to get the data). If I am running my spider it follows the 301 - but I can not debug.

How can you make the shell to follow the 301 to allow one to debug the final page?

Pixelartist
  • 378
  • 5
  • 17

1 Answers1

10

Scrapy uses Redirect Middleware for redirects, however it's not enabled in shell. Quick fix for this:

scrapy shell "https://jigsaw.w3.org/HTTP/300/301.html"
fetch(response.headers['Location'])

Also to debug your spider you probably want to inspect the response your spider is receiving:

from scrapy.shell import inspect_response
def parse(self, response)
    inspect_response(response, self)
    # the spider will stop here and open up an interactive shell during the run
Granitosaurus
  • 20,530
  • 5
  • 57
  • 82