The answers to this question mostly suggest to use apache-common-text StringEscapeUtils
. But this (latest version of commons-text is 1.9) only supports HTML 4, and Mastodon appears to use HTML 5 which includes '
. How can I decode HTML 5 entities, including '
?
Asked
Active
Viewed 1,891 times
2

0__
- 66,707
- 21
- 171
- 266
-
Sorry for asking the obvious, but have you tried the *other* suggestions that are given in other answers? – Federico klez Culloca Jun 20 '21 at 17:25
-
@FedericoklezCulloca well, I would like to use a fairly standard library, and/or standard Java API _if there is any_. I also wonder why Apache doesn't support HTML 5 entities, or if I'm just missing a newer version of the library. – 0__ Jun 20 '21 at 20:17
1 Answers
2
unbescape does the job well:
final String unescapedText = HtmlEscape.unescapeHtml("'");
System.out.println(unescapedText);
Result:
'
Maven:
<!-- https://mvnrepository.com/artifact/org.unbescape/unbescape -->
<dependency>
<groupId>org.unbescape</groupId>
<artifactId>unbescape</artifactId>
<version>1.1.6.RELEASE</version>
</dependency>

Spectric
- 30,714
- 6
- 20
- 43