I'm trying to download www.pandora.com/profile/stations/olin_d_kirkland
HTML with Java to match what I get when I select 'view page source' from the context menu of the webpage in Chrome.
Now, I know how to download webpage HTML source code with Java. I have done it with downloads.nl and tested it on other sites. However, Pandora is being a mystery. My ultimate goal is to parse the 'Stations' from a Pandora account.
Specifically, I would like to grab the Station names from a site such as www.pandora.com/profile/stations/olin_d_kirkland
I have attempted using the selenium library and the built in URL getter in Java, but I only get ~4700 lines of code when I should be getting 5300. Not to mention that there is no personalized data in the code, which is what I'm looking for.
I figured it was that I wasn't grabbing the JavaScript or letting the JavaScript execute first, but even though I waited for it to load in my code, I would only always get the same result.
If at all possible, I should have a method called 'grabPageSource()' that returns a String. It should return the source code when called upon.
public class PandoraStationFinder {
public static void main(String[] args) throws IOException, InterruptedException {
String s = grabPageSource();
String[] lines = s.split("\n\r");
String t;
ArrayList stations = new ArrayList();
for (int i = 0; i < lines.length; i++) {
t = lines[i].trim();
Pattern p = Pattern.compile("<a href=\"/station/\\d+\">[\\w\\s]+</a>");
Matcher m = p.matcher(t);
if (m.matches() ? true : false) {
Station someStation = new Station(t);
stations.add(someStation);
// System.out.println("I found a match on line " + i + ".");
// System.out.println(t);
}
}
}
public static String grabPageSource() throws IOException {
String fullTxt = "";
// Get HTML from www.pandora.com/profile/stations/olin_d_kirkland
return fullTxt;
}
}
It is irrelevant how it's done, but I'd like, in the final product, to grab a comprehensive list of ALL songs that have been liked by a user on Pandora.