scrapy logging in issue with InitSpider

Question

I'm trying to do an authenticated scrapy login with InitSpider. For some reason, with InitSpider it ALWAYS fails to login. My code is similar to the answer in the below post:

Crawling LinkedIn while authenticated with Scrapy

The response I see in logs is this:

2012-12-20 22:56:53-0500 [linked] DEBUG: Redirecting (302) to <GET https://example.com/> from <POST https://example.com/>

Using the code from the above post, I have the same init_request, login, and check_login_response functions. I can see with log statements that it reaches the login function, but it seems to never reach the check_login_response function.

When I re-implement the code using BaseSpider, and I do the FormRequest in the parse function, i'm able to login with no issue. Is there a reason for this? Is there something else I should be doing? Why am I getting a redirect for logging in with InitSpider?

[EDIT]

class DemoSpider(InitSpider):
    name = 'linked'
    login_page = # Login URL
    start_urls = # All other urls

    def init_request(self):
        #"""This function is called before crawling starts."""
        return Request(url=self.login_page, callback=self.login)

    def login(self, response):
        #"""Generate a login request."""
        return FormRequest.from_response(response, 
            formdata={'username': 'username', 'password': 'password'},
            callback=self.check_login_response)

    def check_login_response(self, response):
        #"""Check the response returned by a login request to see if we are successfully logged in."""
        if "Sign Out" in response.body:
            self.log("\n\n\nSuccessfully logged in. Let's start crawling!\n\n\n")
            # Now the crawling can begin..
            return self.initialized()
        else:
            self.log("\n\n\nFailed, Bad times :(\n\n\n")
            # Something went wrong, we couldn't log in, so nothing happens.

    def parse(self, response):
        self.log('got to the parse function')

Above is my spider code.

When I have this kind of issues I debug the HTTP requests and responses with wireshark. Which version of scrapy are you using? — llazzaro, Dec 21 '12 at 17:41

KVISH · Accepted Answer · 2013-02-15T23:16:25.303

2

After struggling with this for a bit, I figured it out, and I posted the solution on my blog:

http://tmblr.co/ZjkSZteCOTyH

Basically I use BaseSpider and I override the start_requests method to handle the login.

edited Feb 15 '13 at 23:16

answered Dec 21 '12 at 20:20

KVISH

12,923
17
86
162

scrapy logging in issue with InitSpider

1 Answers1