I'm trying to do an authenticated scrapy login with InitSpider
. For some reason, with InitSpider
it ALWAYS fails to login. My code is similar to the answer in the below post:
Crawling LinkedIn while authenticated with Scrapy
The response I see in logs is this:
2012-12-20 22:56:53-0500 [linked] DEBUG: Redirecting (302) to <GET https://example.com/> from <POST https://example.com/>
Using the code from the above post, I have the same init_request
, login
, and check_login_response
functions. I can see with log statements that it reaches the login
function, but it seems to never reach the check_login_response
function.
When I re-implement the code using BaseSpider
, and I do the FormRequest
in the parse
function, i'm able to login with no issue. Is there a reason for this? Is there something else I should be doing? Why am I getting a redirect for logging in with InitSpider
?
[EDIT]
class DemoSpider(InitSpider):
name = 'linked'
login_page = # Login URL
start_urls = # All other urls
def init_request(self):
#"""This function is called before crawling starts."""
return Request(url=self.login_page, callback=self.login)
def login(self, response):
#"""Generate a login request."""
return FormRequest.from_response(response,
formdata={'username': 'username', 'password': 'password'},
callback=self.check_login_response)
def check_login_response(self, response):
#"""Check the response returned by a login request to see if we are successfully logged in."""
if "Sign Out" in response.body:
self.log("\n\n\nSuccessfully logged in. Let's start crawling!\n\n\n")
# Now the crawling can begin..
return self.initialized()
else:
self.log("\n\n\nFailed, Bad times :(\n\n\n")
# Something went wrong, we couldn't log in, so nothing happens.
def parse(self, response):
self.log('got to the parse function')
Above is my spider code.