11

i am new to scrapy and decided to try it out because of good online reviews. I am trying to login to a website with scrapy. I have successfully logged in with a combination of selenium and mechanize by collecting the needed cookies with selenium and adding them to mechanize. Now I am trying to do something similar with scrapy and selenium but cant seem to get anything to work. I cant really even tell if anything is working or not. Can anyone please help me. Below is what Ive started on. I may not even need to transfer the cookies with scrapy but i cant tell if the thing ever actually logs in or not. Thanks

from scrapy.spider import BaseSpider
from scrapy.http import Response,FormRequest,Request
from scrapy.selector import HtmlXPathSelector
from selenium import webdriver

class MySpider(BaseSpider):
    name = 'MySpider'
    start_urls = ['http://my_domain.com/']

    def get_cookies(self):
        driver = webdriver.Firefox()
        driver.implicitly_wait(30)
        base_url = "http://www.my_domain.com/"
        driver.get(base_url)
        driver.find_element_by_name("USER").clear()
        driver.find_element_by_name("USER").send_keys("my_username")
        driver.find_element_by_name("PASSWORD").clear()
        driver.find_element_by_name("PASSWORD").send_keys("my_password")
        driver.find_element_by_name("submit").click()
        cookies = driver.get_cookies()
        driver.close()
        return cookies

    def parse(self, response,my_cookies=get_cookies):
        return Request(url="http://my_domain.com/",
            cookies=my_cookies,
            callback=self.login)

    def login(self,response):
        return [FormRequest.from_response(response,
            formname='login_form',
            formdata={'USER': 'my_username', 'PASSWORD': 'my_password'},
            callback=self.after_login)]

    def after_login(self, response):
        hxs = HtmlXPathSelector(response)
        print hxs.select('/html/head/title').extract()
JonDog
  • 517
  • 7
  • 23

1 Answers1

10

Your question is more of debug issue, so my answer will have just some notes on your question, not the exact answer.

def parse(self, response,my_cookies=get_cookies):
    return Request(url="http://my_domain.com/",
        cookies=my_cookies,
        callback=self.login)

my_cookies=get_cookies - you are assigning a function here, not the result it returns. I think you don't need to pass any function here as parameter at all. It should be:

def parse(self, response):
    return Request(url="http://my_domain.com/",
        cookies=self.get_cookies(),
        callback=self.login)

cookies argument for Request should be a dict - please verify it is indeed a dict.

I cant really even tell if anything is working or not.

Put some prints in the callbacks to follow the execution.

warvariuc
  • 57,116
  • 41
  • 173
  • 227
  • 1
    After fixing the issues that you noted I was able to successfully login! Note: the cookies that selenium returned were a list of dictionary's which had to be changed into a single dictionary. Thanks a lot for the help. – JonDog Jun 26 '12 at 13:44
  • sorry, Im new to stackoverflow. I tired vote up but it says I need 15 reputation before I can vote. I dont see any other way to mark as answered either.. UPDATE - Ok, I clicked the check mark. I think that is it. – JonDog Jun 26 '12 at 17:06
  • @JohnDog..could you please post as to how you handled the cookie conversion from one form to the other – Amistad Feb 10 '15 at 05:46
  • 2
    actually in Scrapy 0.24 which is the latest version,the cookie can be either a dict or a list of dicts.. – Amistad Feb 11 '15 at 11:32