0

I have a website that contains a table that look like similar(bigger..) to this one:

</table>    
<tr>
    <td>
        <table width="100%" cellspacing="-1" cellpadding="0" border="0" dir="rtl" style="padding-top: 25px;">
            <tr>
                <td align="right" style="padding-right: 25px;">
                    <span class="artist_name_txt">
                            <a href="/namelink">name</a>
                            <p class="diccografia">subname</p>
                            </span>
                </td>
            </tr>
        </table>
    </td>
</tr>

<tr>
    <td>
        <table width="100%" border="0" cellspacing="0" cellpadding="0" dir="rtl" style="padding-right: 25px; padding-left: 25px">

                <tr>
                        <td class="songs" align="right">

                                <a href="/number1link" class="artist_player_songlist">  number1</a>

                            </td>
                    </tr>
                <tr>
                        <td class="songs" align="right">

                                <a href="/number2link" class="artist_player_songlist">number2</a>


.......
            </td>   
        </tr>
</table>

and I need an idea how can i parse the website and extract this table into 2 arrays -

  • one will be something like names{number1, number2}
  • and the second will be links{number1link, number2link}

I tried a lot of ways and nothing really helps me.

alecxe
  • 462,703
  • 120
  • 1,088
  • 1,195
Didi78
  • 247
  • 1
  • 15

1 Answers1

1

You should read the JSoup Cookbook - especially the Selector syntax is very powerful.

Here's an example:

final String html = ...
// use connect().get() instead if you connect to an website
Document doc = Jsoup.parse(html); 
List<String> names = new ArrayList<>();
List<String> links = new ArrayList<>();

for( Element element : doc.select("a.artist_player_songlist") )
{
    names.add(element.text());
    links.add(element.attr("href"));
}

System.out.println("Names: " + names);
System.out.println("Links: " + links);

Output:

Names: [number1, number2]  
Links: [/number1link, /number2link]

Community
  • 1
  • 1
ollo
  • 24,797
  • 14
  • 106
  • 155
  • well, i think that that code worked only once, i dont know why, but somehow when i use doc = Jsoup.connect(url).get(); (for this url http://shironet.mako.co.il/artist?type=works&lang=1&prfid=975), i get this html: and not the normal page html, i think that may be because of an ad in this page.. there is anywhy to pass over this ad?(or maybe something else here is the problem?) – Didi78 May 12 '15 at 16:04
  • The code works for the problem you described. It wont work with the new link since there's a javascript and jsoup doesn't support JS execution. However, not all is lost - you can combine Jsoup with a JS library. Please see [here](http://stackoverflow.com/questions/20633294/fetch-contentsloaded-through-ajax-call-of-a-web-page/20642675#20642675) for some examples. Jsoup + HtmlUnit has shown as a good combination. – ollo May 12 '15 at 18:07
  • Thanks for the comment, but I understood that I can't use Htmlunit in android, what is the alternative? There is a simple tutorial for this? – Didi78 May 13 '15 at 06:27
  • 1
    You can also try with the others listed in my linked answer. I'll edit some (hopefully) useful discussions into my answer (not enough space here in the comments). – ollo May 16 '15 at 17:23
  • OK, thanks, I will try it soon and I will update if I will that works, ty! – Didi78 May 16 '15 at 17:46