0

I have written a method to check my XML strings for &.

I need to modify the method to include the following:

< &lt

> &gt

\ &guot

& &amp

\ &apos

Here is the method

private String xmlEscape(String s) {
    try {
        return s.replaceAll("&(?!amp;)", "&amp;");
    }
    catch (PatternSyntaxException pse) {
        return s;
    }
} // end xmlEscape()

Here is the way I am using it

 sb.append("            <Host>" + xmlEscape(url.getHost()) + "</Host>\n");

How can I modify my method to incorporate the rest of the symbols?

EDIT

I think I must not have phrase the question correctly. In the xmlEscape() method I am wanting to check the string for the following chars < > ' " &, if they are found I want to replace the found char with the correct char.

Example: if there is a char & the char would be replaced with &amp; in the string.

Can you do something as simple as

try {
   s.replaceAll("&(?!amp;)", "&amp;");
   s.replaceAll("<", "&lt;");
   s.replaceAll(">", "&gt;");
   s.replaceAll("'", "&apos;");
   s.replaceAll("\"", "&quot;");
   return s;
}
catch (PatternSyntaxException pse) {
   return s;
}   
Baz
  • 36,440
  • 11
  • 68
  • 94
jkteater
  • 1,381
  • 3
  • 36
  • 69
  • Hm, no one hinders you to call `replaceAll` more than just once... maybe I just do not understand the question?! – home Sep 24 '12 at 16:51
  • 1
    This one might help as well (2nd hit on google): http://stackoverflow.com/questions/439298/best-way-to-encode-text-data-for-xml-in-java – home Sep 24 '12 at 16:53
  • The negative lookahead (`(!?amp;)`) is a bug. The input is presumably plain text. Suppose the input is `"In XML, to get an ampersand you need to write '&'"`. Your code will incorrectly leave the `&` as is. – Laurence Gonsalves Sep 24 '12 at 16:57
  • You should most definitely NOT be rewriting this yourself. There are several pre-existing libraries that provide fully debugged and RFC-compliant versions of this function. Don't reinvent the wheel. – Jim Garrison Sep 24 '12 at 22:51

2 Answers2

4

You may want to consider using Apache commons StringEscapeUtils.escapeXml method or one of the many other XML escape utilities out there. That gives you a correct escaping to XML content without worrying about missing something when you need to escape something else but a host name.

gnomie
  • 439
  • 5
  • 16
2

Alternatively have you considered using the StAX (JSR-173) APIs to compose your XML document rather than appending strings together (an implementation is included in the JDK/JRE)? This will handle all the necessary character escaping for you:

package forum12569441;

import java.io.*;
import javax.xml.stream.*;

public class Demo {

    public static void main(String[] args) throws Exception {
        // WRITE THE XML
        XMLOutputFactory xof = XMLOutputFactory.newFactory();

        StringWriter sw = new StringWriter();
        XMLStreamWriter xsw = xof.createXMLStreamWriter(sw);
        xsw.writeStartDocument();
        xsw.writeStartElement("foo");
        xsw.writeCharacters("<>\"&'");
        xsw.writeEndDocument();

        String xml = sw.toString();
        System.out.println(xml);

        // READ THE XML
        XMLInputFactory xif = XMLInputFactory.newFactory();
        XMLStreamReader xsr = xif.createXMLStreamReader(new StringReader(xml));
        xsr.nextTag(); // Advance to "foo" element
        System.out.println(xsr.getElementText());
    }

}

Output

<?xml version="1.0" ?><foo>&lt;&gt;"&amp;'</foo>
<>"&'
bdoughan
  • 147,609
  • 23
  • 300
  • 400
  • 1
    +1 - if you want XML then use an XML tool that knows all the rules, otherwise it's far too easy to end up with something not well-formed that other XML tools can't parse. – Ian Roberts Sep 25 '12 at 07:29