3

enter image description here

enter image description here

Hey does anyone know how to parse the "Light rain", " 7°C", and "Limited"? These are stored as #text so that's kind of throwing me off. For reference, to parse "Temperature:", it would be Element element5 = doc.select("strong").get(3); Thanks!

Kevin
  • 61
  • 8
  • In order to parse them, you need a grammar of some kind. What is the grammar? – Stephen C Jul 04 '20 at 06:13
  • JSoup parses HTML text. You don't have to parse anything. –  Jul 04 '20 at 06:30
  • Perhaps "parse" is the incorrect word? I'm trying to access "Light Rain" as doc.select("text).get(number) but that seems to produce a null pointer – Kevin Jul 04 '20 at 07:34

2 Answers2

2

The nodes from your example are called text nodes. In Jsoup, you can read the text nodes of a node by using the text() method. So given your example using Jsoup we'd select the td element and then use text() to get it's text value.

However, this would also output the text value from any child nodes, so in your case this would produce Weather: Light rain as a single string. Fortunately, Jsoup also has a ownText() method that only extracts the value from the text nodes that are a direct descendant of the element (and not all children). So given your example code, you could write it like this:

Element element5 = doc.select("td").get(3);
String value = element5.ownText()
Martin Devillers
  • 17,293
  • 5
  • 46
  • 88
  • 1
    Hi Martin, thanks for the response! The code above seems to give "Temperature:" and not "7°C". Is there a way to access the "7°C"? – Kevin Jul 04 '20 at 07:41
  • 1
    You're right and I've edited my answer with new code and a better explanation of what's going on. First of all, you need to select the `td` element instead of the `strong` element since you're interested in the value and the value is a child of the `td` element and not the `strong` element. After that, use the `ownText` method to extract the value from the text node (while ignoring the `strong` child) – Martin Devillers Jul 04 '20 at 07:46
  • 1
    Oh I didn't even know about the ownText thing, you're a lifesaver. Thank you so much Martin! – Kevin Jul 04 '20 at 07:48
  • 1
    No problem. You may also be interested in `getElementsContainingOwnText`or `getElementsMatchingOwnText`. This will allow you to search for nodes using a key like `Temperature:` or `Temperature` (regex) instead of the row number. Parsing tables using row numbers may break if a row is ever added or swapped in the future. – Martin Devillers Jul 04 '20 at 07:56
1

You can use variuos ways to extract required text and one of them is td.childNode(1).toString() and complete solution is mentioned below:

   public static void main(String[] args) {


    // Parse HTML String using JSoup library

    String HTMLSTring = "<html>\n" +
            " <head></head>\n" +
            " <body>\n" +
            "  <table class=\"table\"> \n" +
            "   <tbody>\n" +
            "    <tr> \n" +
            "     <td><strong>Weather: </strong>Light Rain</td> \n" +
            "    </tr> \n" +
            "    <tr> \n" +
            "     <td><strong>Tempratue: </strong>70 C</td> \n" +
            "    </tr> \n" +
            "    <tr> \n" +
            "     <td><strong>Visibility: </strong>Limited</td> \n" +
            "    </tr> \n" +
            "    <tr> \n" +
            "     <td><strong>Runs open: </strong>0</td> \n" +
            "    </tr>\n" +
            "   </tbody>\n" +
            "  </table>\n" +
            " </body>\n" +
            "</html>"
            + "<head></head>";

    Document html = Jsoup.parse(HTMLSTring);
    Elements tds = html.getElementsByTag("td");
    for (Element td : tds) {
        //String tdStrongText = td.childNode(0).childNodes().get(0).toString();
        String tdStrongText = td.select("strong").text();
        System.out.print(tdStrongText + " : ");
        String tdText = td.childNode(1).toString();
        System.out.println(tdText);
    }
}

Check out code on github.

Ashish Karn
  • 1,127
  • 1
  • 9
  • 20