1

I am trying to develop a scraper for various sites like angel.co. I'm stuck at designing a crawler for the www.owler.com website, as it requires login through mail, when we try to access information about company.

Each time we login we'll get a new login token on email that will expire after some time. So, is there any proper solution to preserve the login session on the browser session using Selenium with Py-bindings?

I'm just looking for guidelines to handle these type of situation. Already tried automating this task using Selenium, but it wasn't a fruitful approach.

iamdanchiv
  • 4,052
  • 4
  • 37
  • 42
akash dwivedi
  • 77
  • 1
  • 10

1 Answers1

5

I got you man! YES, this can be done via Selenium, but it will take some advanced knowledge of Selenium & basic understanding of how users are authenticated on websites & cookies.

Off the top of my head you have the following options:

  • 1. Storing the email-received authentication link & injecting the token inside it into your browser session in the form of a cookie;
  • 2. Storing your session in the form of a Selenium Profile specific to the browser you're running your tests on and loading it afterwards on the instance spawned by your script.

1. (Note: This worked like a charm from the first go so follow closely.)

  • Open www.owler.com in an incognito window (I am using Chrome) and open the cookies section;
  • Spot the cookies you are working with (see this print-screen);
  • Sign In in order to receive your email. Inspect the Sign-In link (see this print-screen);
  • Copy & load the link into another browser (not your incognito session);
  • Once you are logged-in, open the browser console (F12, or CTRL+Shift+J on Chrome) > go to Applications tab > click on Cookies section (for the Owler domain) and copy the value of OWLER_PC cookie. (see this print-screen for more details)
  • In your anonymous session (not logged in), go to the browser console and add the auth_token in the form of a cookie, via the document.cookie function, like this: document.cookie=OWLER_PC=<yourTokenHere>;
  • Refresh the page 2 times, and VOILA, you are logged in.

Note: I knew that you have to add that cookie as OWLER_PC, because I've inspected the logged-in session and that was the only cookie that was new. The cookie's value (usually) is the same as the authentication token you receive via email.

Now all that is left to do is simulate this via code. You have to store one of these email authentication tokens in your script (notice they expire in 1 year, so you should be good).

Then once you've opened your session, use the Selenium bindings for the framework/language you are using to add said cookie, then refresh the page. For WedriverIO/JavaScript (my weapons of choice) it goes something like this:

browser.setCookie({name: 'OWLER_PC', value: 'SPF-yNNJSXeXJ...'});
browser.refresh();
browser.refresh();
// Assert you are logged in 

2. Sometimes, you don't want to add cookies, or write boiler-plate code to just be logged into a website, or have a specific set of browser-extensions loaded on your Selenium driver instance. So you use Browser Profiles.

You will have to document yourself on it as it is a lengthy topic. This question might also help you as you are using Python Selenium bindings.

Hope this helps!

iamdanchiv
  • 4,052
  • 4
  • 37
  • 42
  • Thanks @iamdanchiv . Really thanks a lot. :) – akash dwivedi May 27 '17 at 06:19
  • @akashdwivedi NP man, hope it will work for you. I noticed I made a mistake when detailing the flow. Let me update the answer. Check in 5 minutes. – iamdanchiv May 27 '17 at 06:21
  • @akashdwivedi I've updated the question. I've also suggested the `Profiles` next-steps in case you want to do it that way. By the way, don't forget to leave some love if this fixes your trouble and upvote/mark as the answer. – iamdanchiv May 27 '17 at 06:41
  • Hey I've done creating the crawler but there is strange problem occuring that pages ,I saved locally have proper page source code but when I load that local file there are no data to show on browser.. ? any suggestion? – akash dwivedi May 27 '17 at 08:35
  • @akashdwivedi what do you mean by `when I load that local file there are no data`? Did you go with the `Profile` approach? Update your question with the snippet of code where you are experiencing the issue, eventually add some logs if any are available. Else it's going to be hard to debug. – iamdanchiv May 27 '17 at 08:41
  • I followed the Selenium Binding Approach, to download the entire webpage.webpage is saved locally now and supposed to show same as online version. but when I load that file(html page) locally on my machine it showing nothing.give me your email or something i'll mail you my downloaded file. – akash dwivedi May 27 '17 at 15:18
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/145272/discussion-between-iamdanchiv-and-akash-dwivedi). – iamdanchiv May 27 '17 at 15:24
  • well I figured it out there is some javascript. which is doing all these. everything working fine when working with javascript disable browser. – akash dwivedi May 27 '17 at 15:25
  • 1
    works like a charm, thanks! – Muhammad Talal Oct 10 '20 at 05:16