I have a string like 4,0 — 10,0
I need to decode it to: 4,0 — 10,0
this code can be checked in https://www.codetable.net/decimal/151
I tried Apache's StringEscapeUtils.unescapeJava
without any luck.
I have a string like 4,0 — 10,0
I need to decode it to: 4,0 — 10,0
this code can be checked in https://www.codetable.net/decimal/151
I tried Apache's StringEscapeUtils.unescapeJava
without any luck.
It is a numerical entity, common in HTML, XML, and their base, SGML.
Try apache's StringEscapeUtils.unescapeHTML*
. This will also take care of named entities like —
.
Or do it yourself:
Pattern entityPattern = Pattern.compile("\\&#(\\d+);");
String s = "4,0 — 10,0";
s = entityPattern.matcher(s).replaceAll(mr
-> new String(int[] {Integer.parseInt(mr.group(1))}, 0, 1);
This does create a string with one Unicode code point of 151. For hexadecimal numeric entities:
Pattern entityPattern = Pattern.compile ("\\&#x([\\da-f]+);",
Pattern.CASE_INSENSITIVE);
String s = "4,0 — 10,0";
s = entityPattern.matcher(s).replaceAll(mr
-> new String(int[] {Integer.parseInt(mr.group(1), 16)}, 0, 1);
If you got this string from an HTML form when the user entered/pasted special characters, you forgot in the form:
<form action="..." accept-charset="UTF-8">
Without this, special characters are converted to numeric entities.
This assumes that the web server already uses UTF-8 for its pages.