5

I have a string that users are able to enter on the internet, currently it is not protected against XSS attacks. I would like to be able to replace < and > symbols. Commonly known as 'less than', 'more than', 'angle brackets' etc.

I am sure this has been asked a million times but I can't find a simple answer. I assume regex is the way forward but can't work out how to pick these characters.

Dech
  • 1,582
  • 4
  • 17
  • 32

4 Answers4

8

You really should use StringEscapeUtils.escapeHtml() from Apache Commons Lang to instead of regex for this. E.g. all you need to do is:

String escaped = StringEscapeUtils.escapeHtml(input);

The best practice to protect against XSS is to escape all HTML entities and this method handles those cases for you. Otherwise you'll be writing, testing and maintaining your own code to do what has already been done. See the OWASP XSS (Cross Site Scripting) Prevention Cheat Sheet for more details.

Gaʀʀʏ
  • 4,372
  • 3
  • 39
  • 59
WhiteFang34
  • 70,765
  • 18
  • 106
  • 111
  • Just downloaded the library. Whats the difference between: escapeHTML3() and escapeHTML4()? – Dech Apr 13 '11 at 14:13
  • It appears that in 3.0-beta they've split the method into two parts. The [javadoc](http://commons.apache.org/lang/api-3.0-beta/org/apache/commons/lang3/StringEscapeUtils.html#escapeHtml4%28java.lang.String%29) only says that `escapeHTML4()` is for HTML 4.0 entities. The [changelog](http://commons.apache.org/lang/upgradeto3_0.html) mentions support for more entities. You might want to stay with the 2.6 stable release until 3.0 is out of beta, it's going to cover the important cases that you need for now. – WhiteFang34 Apr 13 '11 at 22:36
3

Java regex shouldn't require any special treatment for angle brackets. This should work fine:

myString.replace("<", "less than").replace(">", "greater than");

Hope that helps.

-tjw

Travis Webb
  • 14,688
  • 7
  • 55
  • 109
1

As an alternative to regex, you can use a utility class like the Apache Commons StringEscapeUtils class to encode your HTML strings when they are posted back to the server and before storing them in the databse or re-sending them as output.

KP Taylor
  • 2,100
  • 1
  • 17
  • 15
1

Since you tagged this , I'd like to add that the normal approach to escape HTML/XML in JSP is using the JSTL <c:out> tag or fn:escapeXml() function.

E.g.

<c:out value="${user.name}" />
<input type="text" name="name" value="${fn:escapeXml(user.name)}" />

No need for Apache Commons Lang. Plus, escaping should really be done in the view side, not in the model/controller side.

See also:

Community
  • 1
  • 1
BalusC
  • 1,082,665
  • 372
  • 3,610
  • 3,555