I'm trying to fetch data from this webpage: http://www.atm-mi.it/en/Giromilano/Pages/default.aspx. Basically I'm using HtmlUnit in Java to interact with the "Route and timetable finder" in the middle of the left column, looping through each option in the select, clicking on "Find" and gathering the data I need from the resulting pages.
I've had no problem extracting data for urban routes, but can't seem to handle the radio buttons above: clicking on "Underground" in a browser, for example, should bring a new page with different options in the select below.
But I keep getting the same Select as before; to be more precise, I keep getting the same page (page2 has the same HTML code as page).
Clearly something must be going wrong in the .click() function, but what?
This is a simple version of my code:
WebClient webClient = new WebClient(BrowserVersion.FIREFOX_3_6);
webClient.setThrowExceptionOnScriptError(false);
HtmlPage page = webClient.getPage("http://www.atm-mi.it/en/Giromilano/Pages/default.aspx");
HtmlRadioButtonInput radioButton2 = (HtmlRadioButtonInput) page.getElementById("ctl00_SPWebPartManager1_g_e31ad29e_62a8_401c_43ae_eb61300b4fc0_lines_type_rbl_0");
HtmlPage page2 = radioButton2.click();
HtmlSelect lineSelect = (HtmlSelect) page2.getElementById("ctl00_SPWebPartManager1_g_e31ad29e_62a8_401c_43ae_eb61300b4fc0_txt_dp_lines");
int size = lineSelect.getOptionSize();
System.out.println(size);
This is the radio button input HTML:
<input id="ctl00_SPWebPartManager1_g_e31ad29e_62a8_401c_43ae_eb61300b4fc0_lines_type_rbl_0" type="radio" name="ctl00$SPWebPartManager1$g_e31ad29e_62a8_401c_43ae_eb61300b4fc0$lines_type_rbl" value="0" onclick="javascript:setTimeout('__doPostBack(\'ctl00$SPWebPartManager1$g_e31ad29e_62a8_401c_43ae_eb61300b4fc0$lines_type_rbl$0\',\'\')', 0)" />
<label for="ctl00_SPWebPartManager1_g_e31ad29e_62a8_401c_43ae_eb61300b4fc0_lines_type_rbl_0">Underground</label>
The select:
<select name="ctl00$SPWebPartManager1$g_e31ad29e_62a8_401c_43ae_eb61300b4fc0$txt_dp_lines" id="ctl00_SPWebPartManager1_g_e31ad29e_62a8_401c_43ae_eb61300b4fc0_txt_dp_lines" class="dplinee">
EDIT: Ok, so I've tried a different approach: since it looked like some kind of JavaScript engine problem, I figured I could try and disable JavaScript, carrying out the onclick action myself. This is the original JavaScript function:
var theForm = document.forms['aspnetForm'];
function __doPostBack(eventTarget, eventArgument) {
if (!theForm.onsubmit || (theForm.onsubmit() != false)) {
theForm.__EVENTTARGET.value = eventTarget;
theForm.__EVENTARGUMENT.value = eventArgument;
theForm.submit();
}
}
And this is what I did:
HtmlForm aspnetForm = (HtmlForm) page.getElementById("aspnetForm");
HtmlHiddenInput eventTarget = (HtmlHiddenInput) page.getElementById("__EVENTTARGET");
HtmlHiddenInput eventArgument = (HtmlHiddenInput) page.getElementById("__EVENTARGUMENT");
eventTarget.setValueAttribute("ctl00$SPWebPartManager1$g_e31ad29e_62a8_401c_43ae_eb61300b4fc0$lines_type_rbl$0");
eventArgument.setValueAttribute("");
HtmlElement submitButton = (HtmlElement) page.createElement("button");
submitButton.setAttribute("type", "submit");
aspnetForm.appendChild(submitButton);
HtmlPage page2 = submitButton.click();
All good, except I still keep getting the same page with the same old Select. I know this is quite a long and boring question, but I thought I could update it anyway. I hope somebody will eventually have the patience to try this out (and at least confirm I'm not doing some obvious mistake).