33

I am trying to develop an XML export feature to give my application users to export their data in an XML format. I have got this feature ready and working until it started failing for some cases. Then I realized that it was because of some special characters that needs to be encoded. for example the data might contain & or ! or % or ' or # etc. etc. and this needs to be escaped properly. I was wondering if there is a generic utility available that can escape all of the special characters as per the XML specification. I couldn't find anything on Google.

Is there something like that already there? or Is there any other way to do it?

Here is the code I am using to generate XML


Document xmldoc = new DocumentImpl();
Element root = xmldoc.createElement("Report");

Element name= xmldoc.createElement((exportData.getChartName() == null) ? "Report" : exportData.getChartName());
if (exportData.getExportDataList().size() > 0
    && exportData.getExportDataList().get(0) instanceof Vector) {
    // First row is the HEADER, i.e name
    Vector name = exportData.getExportDataList().get(0);
    for (int i = 1; i  value = exportData.getExportDataList().get(i);
        Element sub_root = xmldoc.createElement("Data");
        //I had to remove a for loop from here. StackOverflow description field would not take that. :(
            // Insert header row
            Element node = xmldoc.createElementNS(null, replaceUnrecognizedChars(name.get(j)));
            Node node_value = xmldoc.createTextNode(value.get(j));
            node.appendChild(node_value);
            sub_root.appendChild(node);
            chartName.appendChild(sub_root);
        }
    }
}
root.appendChild(name);

// Prepare the DOM document for writing
Source source = new DOMSource(root);

// Prepare the output file
Result result = new StreamResult(file);

// Write the DOM document to the file
Transformer xformer = TransformerFactory.newInstance().newTransformer();
xformer.transform(source, result);`

Sample XML:


<Data>
    <TimeStamp>2010-08-31 00:00:00.0</TimeStamp>
    <[Name that needs to be encoded]>0.0</[Name that needs to be encoded]>
    <Group_Average>1860.0</Group_Average>
</Data>
Salman A. Kagzi
  • 3,833
  • 13
  • 45
  • 64
  • Possible duplicate of [Best way to encode text data for XML in Java?](https://stackoverflow.com/questions/439298/best-way-to-encode-text-data-for-xml-in-java) – tkruse Jun 30 '18 at 15:12
  • I'll just refer you to this prior question that seems to cover the same topic: https://stackoverflow.com/questions/439298/best-way-to-encode-text-data-for-xml-in-java – DGH Nov 29 '10 at 06:07

3 Answers3

55

You can use apache common lang library to escape a string.

org.apache.commons.lang.StringEscapeUtils

String escapedXml = StringEscapeUtils.escapeXml("the data might contain & or ! or % or ' or # etc");

But what you are looking for is a way to convert any string into a valid XML tag name. For ASCII characters, XML tag name must begin with one of _:a-zA-Z and followed by any number of character in _:a-zA-Z0-9.-

I surely believe there is no library to do this for you so you have to implement your own function to convert from any string to match this pattern or alternatively make it into a value of attritbue.

<property name="no more need to be encoded, it should be handled by XML library">0.0</property>
gigadot
  • 8,879
  • 7
  • 35
  • 51
  • Thanks. This is a handy one, but the problem is it only handles < > " & ' I am looking for something that is more extensive. The string that I want to escape is actually being used as a node name. I have now also added a sample XML in the question. – Salman A. Kagzi Nov 29 '10 at 06:52
  • 2
    According to W3C for XML standard, there a limited number of characters that can be used as element tag. You may want create a generic node and add your header as value of an attribute, e.g. – gigadot Nov 29 '10 at 07:01
  • Here is the rule for element tag name http://www.w3.org/TR/REC-xml/#NT-Name It does not include < > " & '. – gigadot Nov 29 '10 at 07:05
  • Thanks You all for you comments. I think the best course of action would be to use a format something like this I would any way need to encode the name, for that I can use the StringEscapeUtils class. – Salman A. Kagzi Nov 29 '10 at 13:45
  • 1
    escapeXML function is converting unicode charaters as well, which it should not. – Mady Jan 24 '12 at 09:57
  • but this is converting < and > as well – Sibish Sep 17 '16 at 00:28
  • To use in Android Studio, add `compile 'org.apache.commons:commons-lang3:3.5'` to your gradle dependencies. See http://stackoverflow.com/a/31496709/529663 for more info. – lenooh Feb 01 '17 at 15:20
1
public class RssParser {
int length;
    URL url;
URLConnection urlConn;
NodeList nodeList;
Document doc;
Node node;
Element firstEle;
NodeList titleList;
Element ele;
NodeList txtEleList;
String retVal, urlStrToParse, rootNodeName;

public RssParser(String urlStrToParse, String rootNodeName){
    this.urlStrToParse = urlStrToParse;
    this.rootNodeName = rootNodeName;

    url=null;
    urlConn=null;
    nodeList=null;
    doc=null;
    node=null;
    firstEle=null;
    titleList=null;
    ele=null;
    txtEleList=null;
    retVal=null;
            doc = null;
    try {
        url = new URL(this.urlStrToParse);
                    // dis is path of url which v'll parse
        urlConn = url.openConnection();

                    DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
        DocumentBuilder db = dbf.newDocumentBuilder();

        String s = isToString(urlConn.getInputStream());
        s = s.replace("&", "&amp;");
        StringBuilder sb =
                            new StringBuilder
                                    ("<?xml version=\"1.0\" encoding=\"utf-8\"?>");
        sb.append("\n"+s);
        System.out.println("STR: \n"+sb.toString());
        s = sb.toString();

        doc = db.parse(urlConn.getInputStream());
        nodeList = doc.getElementsByTagName(this.rootNodeName); 
        //  dis is d first node which
        //  contains other inner element-nodes
        length =nodeList.getLength();
        firstEle=doc.getDocumentElement();
    }
    catch (ParserConfigurationException pce) {
        System.out.println("Could not Parse XML: " + pce.getMessage());
    }
    catch (SAXException se) {
        System.out.println("Could not Parse XML: " + se.getMessage());
    }
    catch (IOException ioe) {
        System.out.println("Invalid XML: " + ioe.getMessage());
    }
    catch(Exception e){
        System.out.println("Error: "+e.toString());
    }
}


public String isToString(InputStream in) throws IOException {
    StringBuffer out = new StringBuffer();
    byte[] b = new byte[512];
    for (int i; (i = in.read(b)) != -1;) {
        out.append(new String(b, 0, i));
    }
    return out.toString();
}

public String getVal(int i, String param){
    node =nodeList.item(i);
    if(node.getNodeType() == Node.ELEMENT_NODE)
    {
        System.out.println("Param: "+param);
        titleList = firstEle.getElementsByTagName(param);
        if(firstEle.hasAttribute("id"))
        System.out.println("hasAttrib----------------");
        else System.out.println("Has NOTNOT      NOT");
        System.out.println("titleList: "+titleList.toString());
    ele = (Element)titleList.item(i);
    System.out.println("ele: "+ele);
        txtEleList = ele.getChildNodes();
    retVal=(((Node)txtEleList.item(0)).getNodeValue()).toString();
    if (retVal == null)
        return null;
            System.out.println("retVal: "+retVal);
    }
return retVal;
}
}
Chintan Raghwani
  • 3,370
  • 4
  • 22
  • 33
  • in this code, i have made a parser class, of which constructor takes two parameters; 1st one is input stream from where we reads the xml file and 2nd is the first inner node name; then isToStream method is used to to retrieve the string from input stream, this method returns the string; in this returned string i have replaced one special character "&" with "&" and added xml version and encoding at beginning; – Chintan Raghwani Feb 25 '12 at 18:20
0

Use the below code to escapes the characters in a String using XML.StringEscapeUtils is available in apche commons lang3 jar

StringEscapeUtils.escapeXml11("String to be escaped");
Abhishek Jha
  • 111
  • 1
  • 3