Scraping a dynamically loaded, javascript lidded website using Pythons BeautifulSoup

Asked Aug 03 '13 at 15:18

Active Aug 03 '13 at 15:18

Viewed 129 times

I've just started screen scraping using BeautifulSoup in Python 2.7.2, and I want to get data off of this website:

However, using urllib2 to open this URL and using LXML to parse it yields a lot of garbage if I use the .prettify() function.

After viewing the page source, I see that the page is actually rendered using javascript and that the div's are loaded dynamically.

Does anyone have any idea of how to get the data from this website?

Thank you

asked Aug 03 '13 at 15:18

Hamza Tahir

1

Automate a browser using SeleniumHQ, for instance, then retrieve the DOM from that. – Jon Clements Aug 03 '13 at 15:42
Or, look at request urls the js uses, and mimic the request... just make sure you read the sites policy if it has one about doing so. – Jon Clements Aug 03 '13 at 15:43
Thanks for the reply. Could you give me some helpful tutorials or point me in the right direction for SeleniumHQ. I looked at their website, and its a little complicated for a person like me who's new to Python – Hamza Tahir Aug 03 '13 at 16:18

0 Answers0