1

I am trying to crawl an authenticated website with this code. I successfully login to the website but when Im trying to send another FormRequest, I am redirected to the login page again. It seems that session/cookies is not kept by scrapy?

In scrapy docs here, so if i send another request the session is not kept? SO what does # continue scraping with authenticated session... this means at all?

Any Idea? THank you

import scrapy
from scrapy.utils.response import open_in_browser

class LoginSpider(scrapy.Spider):
     name = 'login_spider'
     start_urls = ['https://example.com/login']

def parse(self, response):
    yield scrapy.FormRequest.from_response(
        response,
        formdata={'username': 'username', 'password': 'password'},
        callback=self.after_login
    )

def after_login(self, response):
    if "Authenticated" in response.body.decode("utf-8"):
        # continue scraping with authenticated session...
        url = 'https://example.com/search'
        yield scrapy.FormRequest(
            url,
            formdata={'from': '09/24/2017', 'to': '09/25/2017'},
            callback=self.parse_something
        )

    else:
        self.logger.error("Login failed")
        return

def parse_something(self, response):
    open_in_browser(response)
    self.logger.error(response.body)
    return
Joseph
  • 653
  • 1
  • 12
  • 28
  • Possible duplicate of [Scrapy - how to manage cookies/sessions](https://stackoverflow.com/questions/4981440/scrapy-how-to-manage-cookies-sessions) – Wiggy A. Sep 25 '17 at 11:49
  • 1
    Thank you for your reply. But the thread does not explain what is `# continue scraping with authenticated session...` in scrapy documentation – Joseph Sep 25 '17 at 12:22
  • Basically just keep the same meta value for the key "cookiejar" in every subsequent request for the same authenticated user. – Wiggy A. Sep 25 '17 at 12:53
  • 1
    @WiggyA., i think cookie jar is only needed when we want to maintainer separate cookies for separate session. By defaults cookies are enabled so no meta should be needed as such – Tarun Lalwani Sep 25 '17 at 14:55

0 Answers0