0

Got an http response I need to parse, more precisely I wanna get a part of the response based on a tag. Let's say:

<div class="row"><span>some text<pre>% Copyright (c) </pre></span></div>

So I'd pass "pre" and the parser would return the block between

<pre></pre>. 

Is there a better way to do this in java? I don't understand if HttpMessageParser could do it for me.

Thanks in advance!

3 Answers3

2

Assuming there can be only one pre tag in the response, you can use the substring method to get what you want.

String response="<div class=\"row\"><span>some text<pre>% Copyright (c) </pre></span></div>";

String insidePre=response.substring(response.indexOf("<pre>")+4,response.indexOf("</pre>"));
Imesha Sudasingha
  • 3,462
  • 1
  • 23
  • 34
2

I don't think HttpMessageParser is the correct tool here because this is intended for parsing HTTP messages regardless of whether they contain HTML. For simple parsing, you can use methods from the String class, such as substring() and indexOf(). For a bit more complex parsing, you can use regular expressions. If you need something that actually recognizes HTML syntax, I suggest that you google for an HTML parser library.

Code-Apprentice
  • 81,660
  • 23
  • 145
  • 268
  • @MrMoose This is a good point if the OP wants to completely parse the entire HTML page into a complete syntax tree. However, if the OP is doing something more specific, then a regex can work. The "best" answer completely depends on the amount of sophistication the OP needs. It is difficult to tell how much this is from the little bit in the OP. – Code-Apprentice Aug 18 '16 at 18:09
  • Sure. I didn't mean it as a criticism of your answer, but parsing HTML isn't trivial, and I'd suggest that [Regex is never the way to go](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags). Your followup suggestion of an HTML Parser is the best advice I think. OP, maybe check out examples [this SO post](http://stackoverflow.com/questions/773340/can-you-provide-examples-of-parsing-html) and [it's answer specific to Java](http://stackoverflow.com/a/774519/685760). – Mr Moose Aug 19 '16 at 02:30
  • @MrMoose I'm certainly open to constructive comments to improve my answer. My previous comment wasn't intended in a defensive tone. Rather I was trying to further explain the thinking behind my answer. I'm very wary of such absolutes as "always" and "never". Generally I am of the opinion of using the simplest solution for a problem and no simpler. The OP's goals aren't entirely clear from what is given here. This is why I provide several suggestions in my answer. – Code-Apprentice Aug 19 '16 at 14:33
0

Your input seems to be a valid xml, using XPath is a easy and clean approach :

The xpath would be //pre/text() - searches for pre and retrieves its text content.

    String input = "<div class=\"row\"><span>some text<pre>% Copyright (c) </pre></span></div>";

   XPathFactory xPathFactory = XPathFactory.newInstance();
   XPath xpath = xPathFactory.newXPath();

    try {
        XPathExpression expr = xpath.compile( "//pre/text()" );
        Object output = expr.evaluate( new InputSource(new StringReader(input)), XPathConstants.STRING);

        System.out.println(output.toString());
    } catch (XPathExpressionException e) {
                // TODO Auto-generated catch block
                e.printStackTrace();
    }
SomeDude
  • 13,876
  • 5
  • 21
  • 44