0

I am trying to do some experiment with the org.apache.commons.lang.StringEscapeUtils class but I am finding some difficulties.

I have the following situation in my code:

String notNormalized = "c'è";

System.out.println("NOT NORMALIZED: " + notNormalized);
System.out.println("NORMALIZED: " + StringEscapeUtils.escapeJava(notNormalized));

So first I have declared the notNormalized field that (at least in my head) have to represent a not normalized string that contains an apostrophe character represented by the ' and an accented vowel represented by the è (that should be the è character)

Then I try to print it without normalization and I espect that is print the c'è string and the its normalized version and I expect to retrieve the c'è normalized\converted string.

But the problem is that I still obtain the same output, infact this is what I obtain in the console as output:

NOT NORMALIZED: c'è
NORMALIZED: c'è

Why? What am I missing? What is wrong? How can I perform this test and correctly convert a string that contains character as &apos ?

Sotirios Delimanolis
  • 274,122
  • 60
  • 696
  • 724
AndreaNobili
  • 40,955
  • 107
  • 324
  • 596

1 Answers1

0

What you're looking to do is unescapeHtml4.

So

System.out.println("NORMALIZED: " + StringEscapeUtils.unescapeHtml4(notNormalized));

which prints

NORMALIZED: c'è

Unfortunately, &apos is not an HTML 4 entity and therefore can't be unescaped with this tool. You can use unescapeXml for the &apos but not for the &egrave. You'll have to mix and match.

Community
  • 1
  • 1
Sotirios Delimanolis
  • 274,122
  • 60
  • 696
  • 724