0

Using Jsoup:

Element movie_div = doc.select("div.movie").first();

I got a such HTML-code:

<div class="movie"> 
    <div> 
        <div>
            <strong>Year:</strong> 2014
        </div> 
        <div>
            <strong>Country:</strong> USA
        </div> 
    </div> 
</div>

How can I use jsoup to extract the country and the year?

For the example html I want the extracted values to be "2014" and "USA".

Thanks.

fabian
  • 80,457
  • 12
  • 86
  • 114
Mark Korzhov
  • 2,109
  • 11
  • 31
  • 63
  • I don't know what "parse" exactly means, but if It means that you want to get/change those data you probably need to place it in . Each one should have the same class. And than, using jQuery, get all ".somename" span's and do what you want – Piotrek Aug 18 '14 at 12:10
  • 1
    @Ludwik11: 1) No need to change the html (which may not be possible [or at least illegal], if it's loaded from a webside the OP doesn't own). Text nodes are nodes. They just can't be selected with css only. Even from javascript this is possible, see http://stackoverflow.com/q/6520192/2991525 . 2) This is NOT about javascript. I don't know how you plan to use jQuery from **java** but to me it sounds like nonsense. – fabian Aug 18 '14 at 13:06

2 Answers2

1

Use

Element e = doc.select("div.movie").first().child(0);
List<TextNode> textNodes = e.child(0).textNodes();
String year = textNodes.get(textNodes.size()-1).text().trim();
textNodes = e.child(1).textNodes();
String country = textNodes.get(textNodes.size()-1).text().trim();
fabian
  • 80,457
  • 12
  • 86
  • 114
0

Did you try something like:

Element movie_div = doc.select("div.movie strong").first();

And to get the text value you should try;

movie_div.text();
Fractaliste
  • 5,777
  • 11
  • 42
  • 86
  • 1
    That way I get "Year:" and "Country:", but no "2014" and "USA", cause `text()` gets the combined text of this element. – Mark Korzhov Aug 18 '14 at 12:23
  • @MarkKorzhov True, you should use `.parent()` function before getting the text and then remove the strong tag's text – Fractaliste Aug 18 '14 at 13:28