Scraping Web Applications with Python

Question

Most of my experience with web scraping thus far has been fairly straightforward and easy to figure out. Send the request, download the HTML, and pull out the information needed. At the moment, I am interested in scraping top play data from the Spotify Web Application. This data is not accessible via their API, but it can be seen when navigating through different artist pages.

For example, The National's top played tracks can be found at this link: https://play.spotify.com/artist/2cCUtGK9sDU2EoElnk0GNB

My question is, how is this data generated behind the scenes and is it possible to scrape this data?

score 3 · Answer 1 · edited May 23 '17 at 11:52

3

The data is generated dynamically (downloading HTML won't do the trick) with FLEX on the frontend and what seems like C++/Python on the backend (according to this). Anyway, if you need to scrape JavaScript generated content, it will be hard and a pain in the ass as it is a lot more complicated than scraping a static website.

I suggest you using either PhantomJS (Headless WebKit scriptable with a JS API) or Selenium (Automated browser testing / scraping).

edited May 23 '17 at 11:52

Community

1
1

answered Feb 09 '15 at 18:08

Raito

1,553
11
27

Currently inspecting the page with Firebug and the data I am looking for cannot be found within any HTML tags. Most dynamically generated sites still allow a person to see content between tags, and it seems strange that nothing is displayed on this site. – Jake DeVries Feb 09 '15 at 21:10
1

Try to use a "Select a element to inspect it" tool to find the HTML tag. – Raito Feb 09 '15 at 21:50

Scraping Web Applications with Python

1 Answers1