2

This may be challenging since the builders of the website seem to trying to block this.

I am trying to build a simple app to view some data from a website table. The table is located here:

http://www.cepteteb.com.tr/doviz-kurlari

However, the table data seems to load after page is loaded, so when I try to get the HTML of table, it comes empty. How do I get the table with data?

I am using JSOUP to scrape the table.

private class GetData extends AsyncTask<String, Void, Element> {

        @Override
        protected Element doInBackground(String... params) {
            try {
                Document document = Jsoup.connect(params[0]).get();
                Log.e("Yiit",document+"");
                Element table = document.getElementById("dovizTablo");
                return table;
            } catch (Exception e) {
                e.printStackTrace();
            }

            return null;
        }

        @Override
        protected void onPostExecute(Element element) {
            super.onPostExecute(element);
            Log.e("Yiit",element+"");
            tvMain.setText(element+"");
        }
    }

Result:

<table class="prices prices2" id="dovizTablo"> 
<thead> 
<tr> 
<th>D&ouml;viz Adı</th> 
<th>CEPTETEB Alış</th> 
<th>CEPTETEB Satış</th> 
</tr> 
</thead> 
<tbody> 
</tbody> 
</table>

Expected Behaviour:

<table class="prices prices2" id="dovizTablo">
                            <thead>
                                <tr>
                                    <th>Döviz Adı</th>
                                    <th>CEPTETEB Alış</th>
                                    <th>CEPTETEB Satış</th>
                                </tr>
                            </thead>
                            <tbody>
                            <tr><td>USD</td><td>2.9096 TL</td><td>2.9908 TL</td></tr><tr><td>EUR</td><td>3.1555 TL</td><td>3.2435 TL</td></tr><tr><td>GBP</td><td>4.0558 TL</td><td>4.1688 TL</td></tr></tbody>
                        </table>
yigitserin
  • 355
  • 1
  • 2
  • 13
  • try using Xpath -- http://stackoverflow.com/questions/7085539/does-jsoup-support-xpath -- eg (//*[@id="dovizTablo"]/tbody/tr[1]/td[1]) -- how to get Xpath -- https://www.youtube.com/watch?v=vCNLPHP3E_U – Tasos Mar 01 '16 at 23:55
  • So if I understand correctly, xpath is used to easily access the required element. However, I have no problems navigating to table, it just comes empty. So xpath probably wont be helpful anyway. – yigitserin Mar 02 '16 at 00:01
  • you probably right as it sounds like a timing issue. Xpath gets whats inside an element. Have you tried putting a 2 - 3 second delay before you scrape? – Tasos Mar 02 '16 at 00:05

2 Answers2

0

This site you provided initializes the data with javascript. You can't scrape it with Jsoup.

I can think 2 ways to scrape this page

  1. Use a WebView to visit the page and the run some js to parse what you want and return in to main app. Read this how to achieve it.

  2. Create a web service that can parse and return the data from this site.

Community
  • 1
  • 1
giannisf
  • 2,479
  • 1
  • 17
  • 29
  • Can you point me to any examples on how to create said web service? I am not very experienced in web development. – yigitserin Mar 01 '16 at 23:48
  • if you have a web server then you can create a web app that scrapes the page. This might help you. http://nrabinowitz.github.io/pjscrape/ – giannisf Mar 01 '16 at 23:51
  • Unfortunately I dont, but how can I test if this works anyway? – yigitserin Mar 02 '16 at 00:02
  • When I try to load the page in webview, the content of table is not loading. So maybe they are somehow blocking unusual browsers? – yigitserin Mar 02 '16 at 00:04
  • @user2741186 What if you change the `user-agent`? http://stackoverflow.com/questions/5586197/android-user-agent – giannisf Mar 02 '16 at 00:08
0

the builders of the website seem to trying to block this.

So why do you want to scrape the data ??

Instead, I suggest you to find another source that publicly offers the data you need.

You can also check if the website you're targetting offers an API.

Stephan
  • 41,764
  • 65
  • 238
  • 329