From @Roman post
Create a class and name it HtmlEscapeUtils
:
import org.apache.commons.text.translate.AggregateTranslator;
import org.apache.commons.text.translate.CharSequenceTranslator;
import org.apache.commons.text.translate.EntityArrays;
import org.apache.commons.text.translate.LookupTranslator;
import org.apache.commons.text.translate.NumericEntityUnescaper;
public class HtmlEscapeUtils {
/**
* @see {@link org.apache.commons.text.StringEscapeUtils#UNESCAPE_HTML4}
*/
public static final CharSequenceTranslator UNESCAPE_HTML_SPECIFIC =
new AggregateTranslator(
new LookupTranslator(EntityArrays.ISO8859_1_UNESCAPE),
new LookupTranslator(EntityArrays.HTML40_EXTENDED_UNESCAPE),
new NumericEntityUnescaper());
/**
* @see {@link org.apache.commons.text.StringEscapeUtils#unescapeHtml4(String)}
* @param input - HTML String with e.g. " & ä
* @return XML String, HTML4 Entities replaced, but XML Entites remain (e.g. " und &)
*/
public static final String unescapeHtmlToXml(final String input) {
return UNESCAPE_HTML_SPECIFIC.translate(input);
}
}
And use it in your program
public static void main( String[] args )
{
String source = "How can I unescape only HTML characters such as: Ã and 0,but not special characters such as & or >";
String escaped = HtmlEscapeUtils.unescapeHtmlToXml(source);
System.out.println(escaped);
}
You need the following dependency in your program
<dependency>
<groupId>org.apache.commons</groupId>
<artifactId>commons-text</artifactId>
<version>1.9</version>
</dependency>