2

The answers to this question mostly suggest to use apache-common-text StringEscapeUtils. But this (latest version of commons-text is 1.9) only supports HTML 4, and Mastodon appears to use HTML 5 which includes '. How can I decode HTML 5 entities, including '?

0__
  • 66,707
  • 21
  • 171
  • 266
  • Sorry for asking the obvious, but have you tried the *other* suggestions that are given in other answers? – Federico klez Culloca Jun 20 '21 at 17:25
  • @FedericoklezCulloca well, I would like to use a fairly standard library, and/or standard Java API _if there is any_. I also wonder why Apache doesn't support HTML 5 entities, or if I'm just missing a newer version of the library. – 0__ Jun 20 '21 at 20:17

1 Answers1

2

unbescape does the job well:

final String unescapedText = HtmlEscape.unescapeHtml("'");
System.out.println(unescapedText);

Result:

'

Maven:

<!-- https://mvnrepository.com/artifact/org.unbescape/unbescape -->
<dependency>
    <groupId>org.unbescape</groupId>
    <artifactId>unbescape</artifactId>
    <version>1.1.6.RELEASE</version>
</dependency>
Spectric
  • 30,714
  • 6
  • 20
  • 43