How can I scrape data from a website protected with Shibboleth?

Question

I am attempting to scrape data from one of my University's websites, which uses Shibboleth as a form of authentication/protection. However, I am having difficulty determining the best way to get past it and to the page I wish to scrape. I have valid credentials, which I could use to log in with. Does anyone have any suggestions for how to accomplish this task?

@Ibu Why? He's not asking how to bypass the security, merely how to login programmatically. — Matthew Scharley, May 25 '11 at 04:09

score 1 · Answer 1 · edited May 23 '17 at 10:27

I have been working on scripting Shibbolized login with success ( in my case, to monitor the health of both the Shibboleth IdP and the applications it protects).

I am using Python's urllib module and their classes to handle the redirect following and cookie passing (for Shibboleth) and login form posting. After a little bit of tinkering urllib gets you most of the way to success with Shibbolized login. You could use this approach to handle the initial login to the Shibbolized website and then handle the scraping with a straight forward use of Python's urllib.

Example Python script for logging into Shibboleth

score 0 · Answer 2 · answered Jul 12 '12 at 10:22

0

I believe that ECP profile was design to access Shibboleth protected resources by non-browser client (i.e. command line)

Try one of sample clients available on Shibboleth wiki page I linked above

answered Jul 12 '12 at 10:22

Erwin

522
4
20

score 0 · Answer 3 · answered Jan 04 '13 at 14:14

You can also try Apache JMeter, just record your actions, make some scripting (well it is not so easy in terms of shibboleth), and you can access this pages automatically.

[Edit - better solution] I believe that on Shibboleth Documentation pages are scripts for Grinder (another load testing tool). This test plans where in fact Python (ok Jython) scripts which should be quite easily modified and used for your purposes

score 0 · Answer 4 · answered Jun 26 '19 at 19:01

0

Very late reply, but you could use Facebook Webdriver to do a login and scrape after you're authenticated.

answered Jun 26 '19 at 19:01

jhchnc

439
6
17

score 0 · Answer 5 · answered Jun 19 '11 at 23:37

0

You could use Mechanize to submit forms and login to the website: http://wwwsearch.sourceforge.net/mechanize/

answered Jun 19 '11 at 23:37

hoju

28,392
37
134
178

How can I scrape data from a website protected with Shibboleth?

5 Answers5