I'm having trouble with Python Scrapy.
I have a spider that attempts to login to a site before crawling it, however the site is configured to return response code HTTP 401 on the login page which stops the spider from continuing (even though in the body of that response, the login form is there for submitting).
This is the relevant parts of my crawler:
class LoginSpider(Spider):
name = "login"
start_urls = ["https://example.com/login"]
def parse(self, response):
# Initial user/pass submit
self.log("Logging in...", level=log.INFO)
The above yields:
2014-02-23 11:52:09+0000 [login] DEBUG: Crawled (401) <GET https://example.com/login> (referer: None)
2014-02-23 11:52:09+0000 [login] INFO: Closing spider (finished)
However if I give it another URL to start on (not the login page) which returns a 200:
2014-02-23 11:50:19+0000 [login] DEBUG: Crawled (200) <GET https://example.com/other-page> (referer: None)
2014-02-23 11:50:19+0000 [login] INFO: Logging in...
You see it goes on to execute my parse()
method and make the log entry.
How do I make Scrapy continue to work with the page despite a 401 response code?