The most of libraries offer escaping everything they can including hundreds of symbols and thousands of non-ASCII characters which is not what you want in UTF-8 world.
Also, as Jeff Williams noted, there's no single “escape HTML” option, there are several contexts.
Assuming you never use unquoted attributes, and keeping in mind that different contexts exist, it've written my own version:
private static final long TEXT_ESCAPE =
1L << '&' | 1L << '<';
private static final long DOUBLE_QUOTED_ATTR_ESCAPE =
TEXT_ESCAPE | 1L << '"';
private static final long SINGLE_QUOTED_ATTR_ESCAPE =
TEXT_ESCAPE | 1L << '\'';
private static final long ESCAPES =
DOUBLE_QUOTED_ATTR_ESCAPE | SINGLE_QUOTED_ATTR_ESCAPE;
// 'quot' and 'apos' are 1 char longer than '#34' and '#39'
// which I've decided to use
private static final String REPLACEMENTS = ""&'<";
private static final int REPL_SLICES = /* [0, 5, 10, 15, 19) */
5<<5 | 10<<10 | 15<<15 | 19<<20;
// These 5-bit numbers packed into a single int
// are indices within REPLACEMENTS which is a 'flat' String[]
private static void appendEscaped(
Appendable builder, CharSequence content, long escapes) {
try {
int startIdx = 0, len = content.length();
for (int i = 0; i < len; i++) {
char c = content.charAt(i);
long one;
if (((c & 63) == c) && ((one = 1L << c) & escapes) != 0) {
// -^^^^^^^^^^^^^^^ -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
// | | take only dangerous characters
// | java shifts longs by 6 least significant bits,
// | e. g. << 0b110111111 is same as >> 0b111111.
// | Filter out bigger characters
int index = Long.bitCount(ESCAPES & (one - 1));
builder.append(content, startIdx, i /* exclusive */).append(
REPLACEMENTS,
REPL_SLICES >>> (5 * index) & 31,
REPL_SLICES >>> (5 * (index + 1)) & 31
);
startIdx = i + 1;
}
}
builder.append(content, startIdx, len);
} catch (IOException e) {
// typically, our Appendable is StringBuilder which does not throw;
// also, there's no way to declare 'if A#append() throws E,
// then appendEscaped() throws E, too'
throw new UncheckedIOException(e);
}
}
Consider copy-pasting from Gist without line length limit.
UPD: As another answer suggests, >
escaping is not necessary; also, "
within attr='…'
is allowed, too. I've updated the code accordingly.
You may check it out yourself:
<!DOCTYPE html>
<html lang="en">
<head><title>Test</title></head>
<body>
<p title="<"I'm double-quoted!">"><"Hello!"></p>
<p title='<"I'm single-quoted!">'><"Goodbye!"></p>
</body>
</html>