jSoup getting value of HTML tag

Question

I am reading an html file from the internet and when I read the file, the output to my console is as follows:

<string>
       <String1>
        text
       </String1>
       <level2>
        text2
       </level2>
       <level3>
        text3
       </level3>
       <level4>
        text4
       </level4>
       <level5>
         TEXT
       </level5>
</string>
<string>
           <String2>
            text
           </String2>
           <level2>
            text2
           </level2>
           <level3>
            text3
           </level3>
           <level4>
            text4
           </level4>
           <level5>
             THIS TEXT
           </level5>
    </string>

How can I access the level5 text in the second string? I have been trying all day with no luck and would really appreciate some input from someone who knows more about this.

Here is my code:

String line = null;

            try {
                // FileReader reads text files in the default encoding.
                FileReader fileReader = new FileReader(String.valueOf(doc));

                // Always wrap FileReader in BufferedReader.
                BufferedReader bufferedReader = new BufferedReader(fileReader);

                while ((line = bufferedReader.readLine()) != null) {
                    Elements tdElements = doc.getElementsByTag("level1");
                    for(Element element : tdElements )
                    {
                        //Print the value of the element
                        System.out.println(element.text());
                    }

                }

                // Always close files.
                bufferedReader.close();
            } catch (FileNotFoundException ex) {
                System.out.println(
                        "Unable to open file '" +
                                doc + "'");
            } catch (IOException ex) {
                System.out.println(
                        "Error reading file '"
                                + doc + "'");
                // Or we could just do this:
                // ex.printStackTrace();
            }
        }
//
        catch (IOException e) {
            e.printStackTrace();
        }

@JaredRummler how would I make it so if there were two options I would make sure the conditions were met (under option2 tag instead of option1 tag) before selecting level5? I updated my question above — dchamb, Jan 10 '16 at 00:58
@JaredRummler the real HTML looks like the example. But that code is causing the application to crash..can you check the html again? I updated it. — dchamb, Jan 10 '16 at 01:46

score 1 · Answer 1 · answered Jan 10 '16 at 14:11

The code below uses JSoup to parse the text you were referring to. The variable 'textToParse' is the above html code that you provided. You can use JSoup's Psuedo selectors to find elements in a specific position in the DOM tree. Hope this is what you were looking for.

Document document = Jsoup.parse(textToParse);
Elements stringTags = document.select("string:eq(1)");
for(Element e : stringTags) {
    System.out.println(e.select("level5").text());
}

//Output: THIS TEXT

Stephan · Answer 2 · 2016-01-11T11:41:16.580

You can use a CSS selector here:

string:nth-of-type(2) > level5

DEMO: http://try.jsoup.org/~8w_pfCxDhJwIseTKiKsQjQJOBRs

DESCRIPTION

string:nth-of-type(2) /* Select the 2nd string node in document... */
> level5                /* ... then select all "level5" child nodes  */

SAMPLE CODE

Document doc = ...
Element level5Node = doc.select("string:nth-of-type(2) > level5").first();
if (level5Node ==null) {
   throw new RuntimeException("Unable to locate level5 text...");
}

System.out.println(level5Node.text()); // THIS TEXT

score 0 · Answer 3 · edited May 23 '17 at 12:23

Solution 1: you html is valid XML: use XML tools:

you can get your second level5 with XPath: "//string[2]/level5"

Solution 2: parse it with Jsoup and get the document then use Xpath as solution 1

See Jsoup with XPath / XSoup: Does jsoup support xpath?

Solution 1:

String xml="<root>"+your xml+"</root>";

DocumentBuilderFactory builderFactory =DocumentBuilderFactory.newInstance();
DocumentBuilder builder = builderFactory.newDocumentBuilder();
Document document = builder.parse(new InputSource(new StringReader(xml)));
XPath xPath = XPathFactory.newInstance().newXPath();
String expression="//string[2]/level5";
String value = xPath.evaluate(expression, document);
System.out.println("EVALUATE:"+value);

jSoup getting value of HTML tag

3 Answers3

DESCRIPTION

SAMPLE CODE