7

I want to replace certain characters with their respective HTML entities in an HTML response inside a filter. Characters include <, >, &. I can't use replaceAll() as it will replace all characters, even those that are part of HTML tags.

What is the best approach for doing so?

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
user1448652
  • 169
  • 2
  • 4
  • 9
  • If a single string has already been formed that contains a mixture of HTML tags and standalone characters such as `<`, then it's probably too late. Can you not HTML encode the string *data* before it get's included inside tags? – Damien_The_Unbeliever Jun 11 '12 at 10:23
  • My application boundaries doesn't allow me to do it earlier :( – user1448652 Jun 11 '12 at 11:10
  • 1
    But just think - if it was *possible* to do this reliably with fully formed strings, you wouldn't *need* to do encoding - web browsers would use whatever this magical technique is to distinguish tags from general text. – Damien_The_Unbeliever Jun 11 '12 at 11:16
  • That is what I need to do. so far what i am doing is to traverse the HTML character by character and checking for '<' and '>'. Considering it as tag (ignoring the attributes), I am checking it in pre-defined tag list. If match does not found I am encoding both '<' and '>'. I don't weather it is right approach... – user1448652 Jun 11 '12 at 12:43

4 Answers4

12

From Java you may try Apache Commons Lang (legacy v2) StringEscapeUtils.escapeHtml(). Or with commons-lang3: StringEscapeUtils.escapeHtml4().

Please note this also converts à to &agrave; & such.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
sangupta
  • 2,396
  • 3
  • 23
  • 37
  • This is the best, IMHO, the best solution – Jean-Rémy Revy Jun 12 '12 at 11:39
  • Simple, clean and works just fine in Groovy as well. – The Unknown Dev Aug 13 '14 at 15:35
  • 4
    Also worth noting: if you're (already) using a web framework, there's a good chance a similar function is already built into the framework. Spring, for example, has HtmlUtils.htmlEscape(), documented here: http://docs.spring.io/spring/docs/current/javadoc-api/org/springframework/web/util/HtmlUtils.html – Josh1billion Jul 27 '15 at 21:33
  • `org.apache.commons.lang3` is now deprecated, the replacement is `org.apache.commons.text`. – tigrou Oct 24 '22 at 09:44
1

If you're using a technology such as JSTL, you can simply print out the value using <c:out value="${myObject.property}"/> and it will be automatically escaped.

The attribute escapeXml is true by default.

escapeXml - Determines whether characters <,>,&,'," in the resulting string should be converted to their corresponding character entity codes. Default value is true.

http://docs.oracle.com/javaee/5/jstl/1.1/docs/tlddocs/

adarshr
  • 61,315
  • 23
  • 138
  • 167
0

When developing in Spring ecosystem, one can use HtmlUtils.htmlEscape() method.

For full apidocs, visit https://docs.spring.io/spring-framework/docs/current/javadoc-api/org/springframework/web/util/HtmlUtils.html

Mišo Stankay
  • 339
  • 1
  • 8
0

Since most solutions reference a deprecated Apache class, here's one I've adapted from https://stackoverflow.com/a/16947646/3196753.

public class StringUtilities {

    public static final String[] HTML_ENTITIES = {"&", "<", ">", "\"", "'", "/"};
    public static final String[] HTML_REPLACED =  {"&amp;", "&lt;", "&gt;", "&quot;", "&apos;", "&sol;"};

    public static String escapeHtmlEntities(String text) {
        return StringUtils.replaceEach(text, HTML_ENTITIES, HTML_REPLACED);
    }
}

Note: This is not a comprehensive solution (it's not context-aware -- may be too aggressive) but I needed a quick, effective solution.

tresf
  • 7,103
  • 6
  • 40
  • 101