0

I'm trying to parse an XML coming from an php request URI: http://caracasfutbolclub.com/service/news.php. When I do a log after the String xml is parsed, the response is complete, everything is looking good except for the conversion of '<' to '& lt;' and so on with all the HTML's tags (that can be some utf-8 issue or another codification). The real deal is when I'm requesting the elements from the node, the 'title' XML tag is being retrieved as it should but the problem is the 'introtext' tag that is showing only an '<' instead of all the encoded HTML inside of the tag:

Note: not only showing, if you log after the "map.put("introtext", XMLfunctions.getValue(e, "introtext"));", you will get that the whole string is only the <.

The code that I'm using is the following:

MainActivity:

    ArrayList<HashMap<String, String>> mylist = new ArrayList<HashMap<String, String>>();


    String xml = XMLfunctions.getXML(); // method that is parsing the whole XML as a String.
    Document doc = XMLfunctions.XMLfromString(xml);
    Log.d("XML" , xml);


    NodeList nodes = doc.getElementsByTagName("New");

    for (int i = 0; i < nodes.getLength(); i++) {                           
        HashMap<String, String> map = new HashMap<String, String>();    

        Element e = (Element)nodes.item(i);
        map.put("title", XMLfunctions.getValue(e, "title"));
        map.put("introtext", XMLfunctions.getValue(e, "introtext"));
        map.put("created", "Publicado: " + XMLfunctions.getValue(e, "created"));
        mylist.add(map);            
    }

XMLFuntions:

public final static Document XMLfromString(String xml){

    Document doc = null;

    DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
    try {

        DocumentBuilder db = dbf.newDocumentBuilder();

        InputSource is = new InputSource();
        is.setCharacterStream(new StringReader(xml));
        doc = db.parse(is); 

    } catch (ParserConfigurationException e) {
        System.out.println("XML parse error: " + e.getMessage());
        return null;
    } catch (SAXException e) {
        System.out.println("Wrong XML file structure: " + e.getMessage());
        return null;
    } catch (IOException e) {
        System.out.println("I/O exeption: " + e.getMessage());
        return null;
    }

    return doc;

}

/** Returns element value
  * @param elem element (it is XML tag)
  * @return Element value otherwise empty String
  */
 public final static String getElementValue( Node elem ) {
     Node kid;
     if( elem != null){
         if (elem.hasChildNodes()){
             for( kid = elem.getFirstChild(); kid != null; kid = kid.getNextSibling() ){
                 if( kid.getNodeType() == Node.TEXT_NODE  ){
                     return kid.getNodeValue();
                 }
             }
         }
     }
     return "";
 }

 public static String getXML(){  
        String line = null;

        try {

            DefaultHttpClient httpClient = new DefaultHttpClient();
            HttpPost httpPost = new HttpPost("http://caracasfutbolclub.com/service/news.php");

            HttpResponse httpResponse = httpClient.execute(httpPost);
            HttpEntity httpEntity = httpResponse.getEntity();
            line = EntityUtils.toString(httpEntity);

        } catch (UnsupportedEncodingException e) {
            line = "<results status=\"error\"><msg>Can't connect to server</msg></results>";
        } catch (MalformedURLException e) {
            line = "<results status=\"error\"><msg>Can't connect to server</msg></results>";
        } catch (IOException e) {
            line = "<results status=\"error\"><msg>Can't connect to server</msg></results>";
        }

        return line;

}

public static String getValue(Element item, String str) {       
    NodeList n = item.getElementsByTagName(str);        
    return XMLfunctions.getElementValue(n.item(0));
}
}
AdolfoFermo
  • 107
  • 1
  • 8

1 Answers1

0

Wrap your html in CDATA guards. e.g.

<myxmltag><![CDATA[<p>html content</p>]]></myxmltag>

Orlymee
  • 2,349
  • 1
  • 22
  • 24
  • This could work if I add '<![CDATA[' to the String xml between those tags OR the only option is to modify the webservice? – AdolfoFermo Jun 19 '12 at 13:48
  • what do youmean string xml? you wanted the xml parser to skip the html string and not get confused by '<' CDATA will do that. Post your xml if you are still having problems. – Orlymee Jun 19 '12 at 13:58
  • I mean the String xml to the variable in the code called 'xml' in the code. Should I search for the tag and then add the CDATA line for every time is found the ? Or the only way is to modify the webservice, by programming it to add those lines?... You can check the XML in the php link that I posted initially – AdolfoFermo Jun 19 '12 at 16:36
  • so one option will be to add that cdata tag to your xml when it is generated but keep in mind that this will affect any other applications consuming the same feed. By the looks of it you are sending some news content just use the xml tags to describe the content and use a stylesheet to display it all. On that note why not just use RSS? – Orlymee Jun 19 '12 at 16:47
  • The RSS is not an option, the webpage is not build to perform in that way. Also the webservice was built only for my use, so it's not going to mess up other apps. I added these lines before doing the conversion to a Document 'String firstXml = XMLfunctions.getXML();' 'String secondXml = firstXml.replace("", " <![CDATA[");' 'xml = secondXml.replace("", "]]> ");' 'Document doc = XMLfunctions.XMLfromString(xml);'. Apparently the CDATA changed something because now I'm not seeing anymore the <, now it's blank what I'm parsing. – AdolfoFermo Jun 20 '12 at 13:02
  • :( making something so simple so complicated!!! read up, plenty of posts on SO and on other resources e.g. http://stackoverflow.com/questions/4827344/how-to-parse-xml-using-the-sax-parser – Orlymee Jun 20 '12 at 13:19