1

http://www.smbs.biz/ExRate/StdExRate.jsp

In this Website, I tried to parse the value of the table for the currency.

The below value in the table is the value I want to extract.

In developer tool, I can see the value only in 'Elements' window, not in 'source' window. I guess the data is called when using ajax? How can I extract the data using Jsoup?

Here's the code I was trying to parse the code, which failed:

try {
          doc = Jsoup.connect("http://www.smbs.biz/ExRate/StdExRate.jsp").get();
       } catch (IOException e) {
          // TODO Auto-generated catch block
          e.printStackTrace();
       }
       //Elements exchangeRateElement = doc.select(".brb0 td:nth-child(3)").eq(1);           
       Element exchangeRateElement = doc.getElementsByClass("brb0").get(10);

       String cur=null;

       for (Node node : doc.childNodes()) {
        System.out.println("node : "+node);
         if (node instanceof TextNode) {
            cur = ((TextNode) node).getWholeText();
            cur = ((TextNode) node).text();
             break;
         }
     }
Obito
  • 391
  • 3
  • 8
FGOG
  • 15
  • 5
  • Here's another topic that has some relavent links, but the short answer is there is no way to do it with JSoup. It's a html parser and can't properly parse the values generated by the javascript calls on the page.http://stackoverflow.com/questions/7488872/page-content-is-loaded-with-javascript-and-jsoup-doesnt-see-it – Brion Oct 04 '16 at 19:07
  • @Brion If rendering the page is your target, I agree that most of the time you have to use something like HtmlUnit or PhantomJS. If you are after specific elements, it depends on how the content is generated: if it is just loaded using JavaScript, then jsoup can be used (by imitating the same requests), is the content computed then we need additional tools. In this case, jsoup can do the job, I will write an answer. – Frederic Klein Oct 05 '16 at 08:48

1 Answers1

1

When we load the page in a browser with disabled JavaScript, we note, that the table remains empty.

Activating JavaScript and monitoring the network tab (chrome dev tools/F12) on a reload, we see a request:

http://www.smbs.biz/ExRate/StdExRate_xml.jsp?arr_value=USD_2016-09-13_2016-10-05

And the response contains a chart with the needed information:

<chart 
    [...]   
    <set color='c93749' label='16.09.13' value='1110.6' />
    <set color='c93749' label='16.09.19' value='1112.3' />
    [...]
    <set color='c93749' label='16.10.04' value='1102' />
    <set color='c93749' label='16.10.05' value='1105.1' />
    <styles>
        [...]
    </styles>
</chart>

Before we request the chart we need to grab the JSESSION cookie and add it to the request.

Example Code

String userAgent = "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.116 Safari/537.36";

try {
    // response needed to grab the cookies: res.cookies()
    Response res = Jsoup.connect("http://www.smbs.biz/ExRate/StdExRate.jsp")..timeout(10000)
            .userAgent(userAgent).method(Method.GET).header("Host", "www.smbs.biz").execute();

    Document doc = res.parse();

    String startDate = doc.getElementById("startDate").attr("value").replace(".", "-");
    String endDate = doc.getElementById("endDate").attr("value").replace(".", "-");

    doc = Jsoup.connect("http://www.smbs.biz/ExRate/StdExRate_xml.jsp?arr_value=USD_" + startDate+"_" + endDate)
            .userAgent(userAgent).timeout(10000).header("Host", "www.smbs.biz").cookies(res.cookies())
            .header("Connection", "keep-alive").method(Method.GET)
            .referrer("http://www.smbs.biz/ExRate/StdExRate.jsp").get();

    Elements elements = doc.select("chart > set");

    for (Element element : elements) {
        System.out.println(element.attr("label") + ": " + element.attr("value"));
    }

    Element currentRateElement = doc.select("chart > set").last();

    System.out.println("Current rate for " + currentRateElement.attr("label") + ": " + currentRateElement.attr("value"));

} catch (IOException e) {
    e.printStackTrace();
}

Output

16.09.13: 1110.6
16.09.19: 1112.3
16.09.20: 1120
16.09.21: 1119.5
16.09.22: 1116.8
16.09.23: 1103.1
16.09.26: 1104.2
16.09.27: 1106.9
16.09.28: 1103.5
16.09.29: 1095.7
16.09.30: 1096.3
16.10.04: 1102
16.10.05: 1105.1
Current rate for 16.10.05: 1105.1
Frederic Klein
  • 2,846
  • 3
  • 21
  • 37