2

I am trying to scrape a website protected with a Shibboleth authentification. I need to login and read its content programmatically.

I successfully logged in using the Python Mechanize package. However, the content I am looking for is loaded with Javascript and Mechanize doesn't handle Javascript.

To this end, I tried to login using PhantomJS which handles Javascript, but the website violently slammed the door in my face: "In order to access the resource, you must authenticate yourself".

I realize that I need both tools to achieve my task:

  1. Mechanize for a successful login,
  2. PhantomJS to hopefully get my data (?).

The only thing I would need is to pass cookies from Mechanize to PhantomJS. Is that possible?

Mechanize

#saving Mechanize's cookies
cj.save("MechanizeCookies.txt")

MechanizeCookies.txt

#LWP-Cookies-2.0 Set-Cookie3: _saml_idp=aHR0cHM6Ly9zaGliYm9sZXRoLmVuc2ljYWVXXXXXX; path="/"; domain=".xxxxxxx.fr"; path_spec; domain_dot; expires="2017-02-19 19:08:16Z"; version=0 Set-Cookie3: org.jasig.portal.PORTLET_COOKIE=OgkWalk7G5Woc3Vy_LdMdLakE8GHXXXXXXXX; path="/uPortal/"; domain="ent.xxxxxxx.fr"; path_spec; expires="2015-05-26 19:08:23Z"; version=0

PhantomJS

Here is my try with PhantomJS, but result.png shows the login form.

var page = require('webpage').create();
page.settings.userAgent = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.114 Safari/537.36';
page.open('https://ent.xxxxxxx.fr/home', function (status) {
    page.render('result.png');
});

$ phantomjs cookieloader.js --cookies-file=cookie.txt

How could I load those Mechanize cookies into PhantomJS, CasperJS or any other library's script?

Stéphane Bruckert
  • 21,706
  • 14
  • 92
  • 130
  • 1
    can you not just use selenium with Phantomjs? – Padraic Cunningham May 26 '14 at 21:16
  • I just followed your advice @PadraicCunningham and the [result](http://stackoverflow.com/a/23929939/1515819) is a success! Thank you! The question is not answered though, I will let that question open. Some might find an answer sooner or later. – Stéphane Bruckert May 29 '14 at 18:19

0 Answers0