1

I am attempting to scrape data from one of my University's websites, which uses Shibboleth as a form of authentication/protection. However, I am having difficulty determining the best way to get past it and to the page I wish to scrape. I have valid credentials, which I could use to log in with. Does anyone have any suggestions for how to accomplish this task?

Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
Matt
  • 11
  • 2

5 Answers5

1

I have been working on scripting Shibbolized login with success ( in my case, to monitor the health of both the Shibboleth IdP and the applications it protects).

I am using Python's urllib module and their classes to handle the redirect following and cookie passing (for Shibboleth) and login form posting. After a little bit of tinkering urllib gets you most of the way to success with Shibbolized login. You could use this approach to handle the initial login to the Shibbolized website and then handle the scraping with a straight forward use of Python's urllib.

Example Python script for logging into Shibboleth

Community
  • 1
  • 1
chladni
  • 119
  • 7
0

I believe that ECP profile was design to access Shibboleth protected resources by non-browser client (i.e. command line)

Try one of sample clients available on Shibboleth wiki page I linked above

Erwin
  • 522
  • 4
  • 20
0

You can also try Apache JMeter, just record your actions, make some scripting (well it is not so easy in terms of shibboleth), and you can access this pages automatically.

[Edit - better solution] I believe that on Shibboleth Documentation pages are scripts for Grinder (another load testing tool). This test plans where in fact Python (ok Jython) scripts which should be quite easily modified and used for your purposes

Erwin
  • 522
  • 4
  • 20
0

Very late reply, but you could use Facebook Webdriver to do a login and scrape after you're authenticated.

jhchnc
  • 439
  • 6
  • 17
0

You could use Mechanize to submit forms and login to the website: http://wwwsearch.sourceforge.net/mechanize/

hoju
  • 28,392
  • 37
  • 134
  • 178