0

I am trying to StringEscapeUtils.escapeHtml(String string) to convert the special characters on my web page to HTML entities. But it also escapes the five basic XML entities which are <, >, ", ', and &, which makes my HTML not render correctly since the mentioned characters were escaped.

So what I do after that is use StringEscapeUtils.unescapeXml(String string) to return <, >, ", ', and & back to their single character form.

Is there any other way to do this? Like not include the 5 entities I mentioned when StringEscapeUtils does HTML escaping?

Joshua Dannemann
  • 2,003
  • 1
  • 14
  • 34
mcspiral
  • 147
  • 1
  • 10
  • 1
    Why do it at all? Why not use a character encoding that allows all those characters to be displayed as-is? – RealSkeptic Oct 19 '15 at 19:02
  • http://stackoverflow.com/questions/1265282/recommended-method-for-escaping-html-in-java – Petter Friberg Oct 19 '15 at 19:04
  • The html that I'm trying to escape will be sent as an email. So if the one receiving it has different character encoding, it displays as "?". That's why I'm trying to put in html entity form all characters. – mcspiral Oct 19 '15 at 19:25
  • An e-mail message should also include the character set. Unless the one receiving it has an email client from the '90s or so, then the content-type of the message part should do the trick. And of course, you should add the appropriate `meta` tag to the HTML. – RealSkeptic Oct 20 '15 at 16:55

1 Answers1

0

You can build your own translator:

public static final CharSequenceTranslator ESCAPE_HTML4 = new AggregateTranslator(
                    new LookupTranslator(EntityArrays.ISO8859_1_ESCAPE),
                    new LookupTranslator(EntityArrays.HTML40_EXTENDED_ESCAPE)
            );

This particualr translator leaves out the EntityArrays.BASIC_ESCAPE() data.

Thus, it will convert special characters and ommit the HTML tags.

Convert your text variable using:

text = ESCAPE_HTML4.translate(text);