I am trying to scrape a website in Java, to extract some percentages from a table, which is this one.
These percentages are rendered after the HTML source is processed. So we can know these elements are rendered via Javascript, which makes scraping harder (ops, problem)
So this is the difference between the element BEFORE being rendered:
<div class="user_forecasts" id="57464" />
and AFTER being rendered:
<div class="user_forecasts" id="57464"> <b>1</b>
<div class="percents">61% | 25% | 14%</div>
</div>
Obviously, I wanna get the "61% | 25% | 14%" string, and the rest of percents in the table...
Well, in fact, yes, it's rendered by Javascript, and I found the .js file and luckily I found the interesting part:
// ajax user_forecast load - one call
if ($('div.user_forecasts').length > 0) {
$.ajax({
url: '/vote/percentage',
global: false,
type: 'GET',
data: {
a: $('#jornadaq').val()
},
success: function(percentages) {
perc_obj = eval(percentages);
$('div.user_forecasts').each(function(ind, val) {
if (ind == 14) {
$(this).html("<b>" + perc_obj[ind].value + "</b><div class='percents'>" + perc_obj[ind].porcent + "%" + "</div>");
} else {
$(this).html("<b>" + perc_obj[ind].forecast + "</b><div class='percents'>" + perc_obj[ind].local + "% | " + perc_obj[ind].tie + "% | " + perc_obj[ind].visitor + "%" + "</div>");
}
});
}
});
}
As you see, it's an AJAX call. I checked if I could get the percentages by pasting this code into the Chrome Developer Virtual Machine, and yes, I got what I wanted: the group of elements which contains the data I need for my program.
Please look this ScreenShot (Chrome Developer Virtual Machine)
The thing is I don't know how should I tell Java to code this XML Http Request and then get this data. What libraries do you recommend for this, and how could I use them especifically for this case?