I have some strings that contain XHTML character entities:
"They're quite varied"
"Sometimes the string ∈ XML standard, sometimes ∈ HTML4 standard"
"Therefore -> I need an XHTML entity decoder."
"Sadly, some strings are not valid XML & are not-quite-so-valid HTML <- but I want them to work, too."
Is there any easy way to decode the entities? (I'm using Java)
I'm currently using StringEscapeUtils.unescapeHtml4(myString.replace("'", "\'"))
as a temporary hack. Sadly, org.apache.commons.lang3.StringEscapeUtils
has unescapeHtml4
and unescapeXML
, but no unescapeXhtml
.
EDIT: I do want to handle invalid XML, for example I want "&&xyzzy;" to decode to "&&xyzzy;"
EDIT: I think HTML5 has almost the same character entities as XHTML, so I think HTML 5 decoder would be fine too.