Why I can't use the org.apache.commons.lang.StringEscapeUtils to convert this String containing character as &apos and è?

Question

I am trying to do some experiment with the org.apache.commons.lang.StringEscapeUtils class but I am finding some difficulties.

I have the following situation in my code:

String notNormalized = "c&apos;&egrave;";

System.out.println("NOT NORMALIZED: " + notNormalized);
System.out.println("NORMALIZED: " + StringEscapeUtils.escapeJava(notNormalized));

So first I have declared the notNormalized field that (at least in my head) have to represent a not normalized string that contains an apostrophe character represented by the ' and an accented vowel represented by the è (that should be the è character)

Then I try to print it without normalization and I espect that is print the c'è string and the its normalized version and I expect to retrieve the c'è normalized\converted string.

But the problem is that I still obtain the same output, infact this is what I obtain in the console as output:

NOT NORMALIZED: c&apos;&egrave;
NORMALIZED: c&apos;&egrave;

Why? What am I missing? What is wrong? How can I perform this test and correctly convert a string that contains character as &apos ?

score 0 · Answer 1 · edited May 23 '17 at 11:43

0

What you're looking to do is unescapeHtml4.

So

System.out.println("NORMALIZED: " + StringEscapeUtils.unescapeHtml4(notNormalized));

which prints

NORMALIZED: c&apos;è

Unfortunately, &apos is not an HTML 4 entity and therefore can't be unescaped with this tool. You can use unescapeXml for the &apos but not for the &egrave. You'll have to mix and match.

edited May 23 '17 at 11:43

Community

1
1

answered Mar 17 '15 at 17:23

Sotirios Delimanolis

274,122
60
696
724

It say that it don't recognize the unescapeHtml4() method – AndreaNobili Mar 17 '15 at 17:26
@AndreaNobili You must be using an older version of apache commons. Upgrade to version 3 if possible. – Sotirios Delimanolis Mar 17 '15 at 17:27
@AndreaNobili Version 2 has a [`unescapeHtml`](https://commons.apache.org/proper/commons-lang/javadocs/api-2.6/org/apache/commons/lang/StringEscapeUtils.html#unescapeHtml%28java.lang.String%29) method, but I'm not sure if it's a mix of HTML 3 and 4. – Sotirios Delimanolis Mar 17 '15 at 17:28

Why I can't use the org.apache.commons.lang.StringEscapeUtils to convert this String containing character as &apos and è?

1 Answers1