2

i call a web service that returns some HTML which enclosed in an XML envelop... something like:

<xml version="1.0" cache="false">
    <text color="white">
        <p> Some text <br /> <p>
    </text>
</xml>

I use XmlPullParser to parse this XML/HTML. To get the text in element, i do the following:

case XmlPullParser.START_TAG:

    xmlNodeName = parser.getName();

    if (xmlNodeName.equalsIgnoreCase("text")) {
        String color = parser.getAttributeValue(null, "color");
        String text = parser.nextText();

        if (color.equalsIgnoreCase("white")) {

            detail.setDetail(Html.fromHtml(text).toString());

        }
    }
break;

This works well and gets the text or html in element even if it contains some html tags.

Issue arises when the element's data starts with <p> tag as in above example. in this case the data is lost and text is empty.

How can i resolve this?

EDIT

Thanks to Nik & rajesh for pointing out that my service's response is actually not a valid XML & element not closed properly. But i have no control over the service so i cannot edit whats returned. I wonder if there is something like HTML Agility that can parse any type of malformed HTML or can at least get whats in html tags .. like inside <text> ... </text> in my case?? That would also be good.

OR anything else that i can use to parse what i get from the service will be good as long as its decently implementable.

Excuse me for my bad english

Community
  • 1
  • 1
Aamir
  • 1,747
  • 5
  • 26
  • 50

3 Answers3

3

You are seeing that behavior because what you have inside the <text>...</text> tags is not a text element, but an XML Node element. You should enclose the contents in a CDATA section.

Edit: Providing the code segment for my suggestion in the comment. It does indeed work with the sample XML given by you.

         StringBuffer html = new StringBuffer();
         int eventType = parser.getEventType();
         while (eventType != XmlPullParser.END_DOCUMENT) {
          if(eventType == XmlPullParser.START_TAG) {
              String name = parser.getName();
              if(name.equalsIgnoreCase("text")){
                  isText = true;
              }else if(isText){
                  html.append("<");
                  html.append(name);
                  html.append(">");
              }
          } else if(eventType == XmlPullParser.END_TAG) {
              String name = parser.getName();
              if(name.equalsIgnoreCase("text")){
                  isText = false;
              }else if(isText){
                  html.append("</");
                  html.append(name);
                  html.append(">");                   
              }
          } else if(eventType == XmlPullParser.TEXT) {
              if(isText){
                  html.append(parser.getText());
              }
          }
          eventType = parser.next();
         }
Rajesh
  • 15,724
  • 7
  • 46
  • 95
  • +1 to you .. but i cannot edit whats returned .. i cannot access the web service which returns this data – Aamir Apr 19 '12 at 15:46
  • In case you cannot modify the XML, you should think of building your HTML from whatever comes after the `` START_TAG till the END_TAG. For example, set a flag in your START_TAG if it is "text" and append the tags and texts to a StringBuffer variable till the END_TAG is encountered, where you use the variable and reset the flag. – Rajesh Apr 19 '12 at 17:31
  • thanks Rajesh.. but i dont think so that i would be able to proceed after the Start_Tag ... as when i try to get whats after it through _parser.nextText()_ .. i get an exception ... but if you have encountered/solved such issue.. then providing an example will be beneficial for me – Aamir Apr 19 '12 at 17:41
  • I think you did not understand the suggestion fully. Please see the updated answer. I have tested it with the XML given by you. – Rajesh Apr 19 '12 at 18:08
2

Because above code you don't close "</p>" TAG.

<p> Some text <br /> </p>

Used this line .

Nikhil
  • 16,194
  • 20
  • 64
  • 81
  • thanks Nik.. but this data is returned from a web service which is not in my control at all .. can u please suggest some way to get whatever is inside ... element – Aamir Apr 19 '12 at 11:44
1

Solution

Isnpired by Martin's approach of converting the received data first to string, i managed my problem in a kind of mixed approach.

Convert the received InputStream's value to string and replaced the erroneous tag with "" (or whatever you wish) : as follows

InputStreamReader isr = new InputStreamReader(serviceReturnedStream);
BufferedReader br = new BufferedReader(isr);
StringBuilder xmlAsString = new StringBuilder(512);
String line;
try {
    while ((line = br.readLine()) != null) {
        xmlAsString.append(line.replace("<p>", "").replace("</p>", ""));
    }
} catch (IOException e) {
    e.printStackTrace();
}

Now i have a string which contains correct XML data (for my case), so just use the normal XmlPullParser to parse it instead of manually parsing it myself:

XmlPullParserFactory factory = XmlPullParserFactory.newInstance();
factory.setNamespaceAware(false);
XmlPullParser parser = factory.newPullParser();
parser.setInput(new StringReader(xmlAsString.toString()));

Hope this helps someone!

Community
  • 1
  • 1
Aamir
  • 1,747
  • 5
  • 26
  • 50