2

I need to book an appointment on a website. These appointments are released sporadically and booked up quickly. To even see available appointment times, you have to login & complete a reCaptcha. If I wanted to write a scraper using Headless Chrome to continually scrape the site and notify me when a new appointment comes up, following the login flow each time would require beating the reCaptcha, which is at least non-zero difficult.

A better approach (I thought) would be to log in once manually, grab my session cookies, and then load them into Headless Chrome before making a request directly to the appointment times page. The server would see my request, see my session cookies, and respond as if the manually-logged in session had been refreshed. This is pretty much as outlined in the answer to this StackOverflow question: how to manage log in session through headless chrome?

But this doesn't work, and I can't figure out why. I get redirected every time straight back to the login page. I've tried on Chrome & Firefox, and with several other login-requiring websites (Facebook, Reddit, etc.).

How can these servers possibly discern between the original client and the one using copied cookies, when the cookies are what the servers use to identify clients in the first place?

Exact steps to reproduce:

  1. Login to site of your choice on Chrome, let's say Facebook.
  2. Export your cookies to your clipboard from the site using the EditThisCookie Extension
  3. Launch an incognito window (to reset your active cookies) and import those session cookies with the same handy extension.
  4. Navigate to the target, past-the-login-form url.
  5. Get redirected.
  6. Get frustrated.
Monty Evans
  • 141
  • 1
  • 10

0 Answers0