0

I have the following XML snippet in a string (note the double encoded &):

...
<PARA>
S&P
</PARA>
...

My desired output would be:

> ... <PARA> S&amp;P </PARA> ...

If I use:

StringEscapeUtils.unescapeXml()

The actual oputput is:

 > ... <PARA> S&P </PARA> ...

It seems that StringEscapeUtils.unescapeXml() escapes the input twice, or as long as it contains entities.

Is there a better utility method, or simple solution that can unescape every xml entity (not just a few but all accentuated character) once, so that my encoded & part does not get screwed up?

Thank, Peter

Peter Jaloveczki
  • 2,039
  • 1
  • 19
  • 35
  • I am sure you have already checked [this link](https://stackoverflow.com/questions/2833956/how-to-unescape-xml-in-java) but I put it here just in case. I hope it helps. – Alp Nov 16 '17 at 11:14

2 Answers2

3

When you use third-party libraries, you should include the library name and the version.

StringEscapeUtils is part of Apache Commons Text and Apache Commons Lang (deprecated). The latest versions (as of November 2017) are Commons Text 1.1 and Commons Lang 3.3.7. Both versions show correct results.

import org.apache.commons.text.StringEscapeUtils;
public class EscapeTest {
  public static void main(String[] args) {
    final String s = "&lt;PARA&gt; S&amp;amp;P &lt;/PARA&gt;";
    System.out.println(StringEscapeUtils.unescapeXml(s));
  }
}

Output: <PARA> S&amp;P </PARA>

vanje
  • 10,180
  • 2
  • 31
  • 47
  • Thanks. The reason was that this text was part of XML text node. When a document was built it automatically unescaped the entity references. this is a little strange to me but I was just inspecting this when you wrote the answer. – Peter Jaloveczki Nov 16 '17 at 11:39
1

Perhaps a long winded way of doing it, but I can't use Apache Commons

public static void main(String[] args) {
        String a = "&lt;PARA&gt; S&amp;amp;P &lt;/PARA&gt;";
        String ea = unescapeXML(a);
        System.out.println(ea);
    }

    public static String unescapeXML(final String xml) {
        Pattern xmlEntityRegex = Pattern.compile("&(#?)([^;]+);");
        StringBuffer unescapedOutput = new StringBuffer(xml.length());

        Matcher m = xmlEntityRegex.matcher(xml);
        Map<String, String> builtinEntities = null;
        String entity;
        String hashmark;
        String ent;
        int code;
        while (m.find()) {
            ent = m.group(2);
            hashmark = m.group(1);
            if ((hashmark != null) && (hashmark.length() > 0)) {
                code = Integer.parseInt(ent);
                entity = Character.toString((char) code);
            } else {
                if (builtinEntities == null) {
                    builtinEntities = buildBuiltinXMLEntityMap();
                }
                entity = builtinEntities.get(ent);
                if (entity == null) {
                    entity = "&" + ent + ';';
                }
            }
            m.appendReplacement(unescapedOutput, entity);
        }
        m.appendTail(unescapedOutput);
        return unescapedOutput.toString();

    }

    private static Map<String, String> buildBuiltinXMLEntityMap() {
        Map<String, String> entities = new HashMap<>(10);
        entities.put("lt", "<");
        entities.put("gt", ">");
        entities.put("amp", "&");
        entities.put("apos", "'");
        entities.put("quot", "\"");
        return entities;
    }

Output:

<PARA> S&amp;P </PARA>
achAmháin
  • 4,176
  • 4
  • 17
  • 40