39

I have gone through the OWASP top ten vulnerabilities and found that Cross-Site Scripting is the one we have to take notes. There was few way recommended solutions. One has stated that Do not use "blacklist" validation to detect XSS in input or to encode output. Searching for and replacing just a few characters (< and > and other similar characters or phrases such as script) is weak and has been attacked successfully. Even an unchecked “<b>” tag is unsafe in some contexts. XSS has a surprising number of variants that make it easy to bypass blacklist validation. Another solution said that Strong output encoding. Ensure that all user-supplied data is appropriately entity encoded (either HTML or XML depending on the output mechanism) before rendering. So, which is the best way to prevent cross site scripting to validate and replace the input or encoding the output ?

Mdhar9e
  • 1,376
  • 4
  • 23
  • 46
  • Possible duplicate: http://stackoverflow.com/questions/24723/best-regex-to-catch-xss-attack-in-java – aem Jul 21 '09 at 15:10
  • Duplicate: https://stackoverflow.com/questions/2658922/xss-prevention-in-jsp-servlet-web-application/ – BalusC Sep 20 '20 at 09:34

3 Answers3

43

The normal practice is to HTML-escape any user-controlled data during redisplaying in JSP, not during processing the submitted data in servlet nor during storing in DB. In JSP you can use the JSTL (to install it, just drop jstl-1.2.jar in /WEB-INF/lib) <c:out> tag or fn:escapeXml function for this. E.g.

<%@ taglib uri="http://java.sun.com/jsp/jstl/core" prefix="c" %>
...
<p>Welcome <c:out value="${user.name}" /></p>

and

<%@ taglib uri="http://java.sun.com/jsp/jstl/functions" prefix="fn" %>
...
<input name="username" value="${fn:escapeXml(param.username)}">

That's it. No need for a blacklist. Note that user-controlled data covers everything which comes in by a HTTP request: the request parameters, body and headers(!!).

If you HTML-escape it during processing the submitted data and/or storing in DB as well, then it's all spread over the business code and/or in the database. That's only maintenance trouble and you will risk double-escapes or more when you do it at different places (e.g. & would become &amp;amp; instead of &amp; so that the enduser would literally see &amp; instead of & in view. The business code and DB are in turn not sensitive for XSS. Only the view is. You should then escape it only right there in view.

See also:

Community
  • 1
  • 1
BalusC
  • 1,082,665
  • 372
  • 3,610
  • 3,555
  • 2
    +1 ! If only we could somehow configure the EL expression to default to html-escaping, then we wouldnt need to have lots of c:out everywhere :-) – Bertie Jul 03 '12 at 04:20
  • Ah, you have answered about this issue in here : http://stackoverflow.com/questions/5887037/escape-html-entities-in-jsp-jspx-no-solution-for-problem-that-should-not-even – Bertie Jul 03 '12 at 04:22
  • @Albert: if I'm not mistaken, you're already using JSF/Facelets, right? All XSS escaping is already done for you then, yes. – BalusC Jul 03 '12 at 04:26
  • Wow, you still remember ! Yes i was with Facelets, but currently on a project using SpringMVC, and with JSP as the view. So my current conclusion is to c:out when displaying user inputs, and EL for the rests. – Bertie Jul 03 '12 at 07:33
  • What if there is something in the output value like, it would display as it is without taking effect. – Narayana Nagireddi Oct 04 '12 at 14:59
  • @Altair: use `Jsoup#clean()` to remove potentially malicious HTML from the string. See also http://stackoverflow.com/questions/7722159/csrf-xss-and-sql-injection-attack-prevention-in-jsf/7725675#7725675 – BalusC Oct 04 '12 at 15:03
  • But I also have an anchor tag with some JavaScript in the value attribute, so what could I do in this case? – Narayana Nagireddi Oct 05 '12 at 00:30
  • What happens if you have a JSONP API. You do not know how your customers are going to use it. Wouldn't it be better to remove vulnerabilities when storing the data in this case? – Alessandro Giannone Mar 24 '13 at 03:44
  • I am having the same issue and questioning the validity of your existing answer as it's from 2009. I opened a new question, can you answer please? My question is here - https://stackoverflow.com/q/63969509/1379286 – PeakGen Sep 20 '20 at 02:06
  • @LemonJuice: answer is still valid these days. – BalusC Sep 20 '20 at 09:30
  • @BalusC: Awesome. So basically, having JSTL is more than enough , isnt it? Before going to an explanation for the security auditors, I must know the facts, thats why. – PeakGen Sep 21 '20 at 08:32
6

Use both. In fact refer a guide like the OWASP XSS Prevention cheat sheet, on the possible cases for usage of output encoding and input validation.

Input validation helps when you cannot rely on output encoding in certain cases. For instance, you're better off validating inputs appearing in URLs rather than encoding the URLs themselves (Apache will not serve a URL that is url-encoded). Or for that matter, validate inputs that appear in JavaScript expressions.

Ultimately, a simple thumb rule will help - if you do not trust user input enough or if you suspect that certain sources can result in XSS attacks despite output encoding, validate it against a whitelist.

Do take a look at the OWASP ESAPI source code on how the output encoders and input validators are written in a security library.

Vineet Reynolds
  • 76,006
  • 17
  • 150
  • 174
0

My preference is to encode all non-alphaumeric characters as HTML numeric character entities. Since almost, if not all attacks require non-alphuneric characters (like <, ", etc) this should eliminate a large chunk of dangerous output.

Format is &#N;, where N is the numeric value of the character (you can just cast the character to an int and concatenate with a string to get a decimal value). For example:

// java-ish pseudocode
StringBuffer safestrbuf = new StringBuffer(string.length()*4);
foreach(char c : string.split() ){  
  if( Character.isAlphaNumeric(c) ) safestrbuf.append(c);
  else safestrbuf.append(""+(int)symbol);

You will also need to be sure that you are encoding immediately before outputting to the browser, to avoid double-encoding, or encoding for HTML but sending to a different location.

  • Space isn't alphanumeric, right? – Tom Hawtin - tackline Jul 21 '09 at 23:28
  • 1
    Correct. Space is not alphanumeric, and, given this very draconian algorithm, will encode to . This may seem like it's never necessary, but think of a case where only known dangerous characters of single & double quote are handled, if the output is constructed from the following JSP snippet: > if the user were to submit a value such as x onfocus=alert(1) s/he'd be able to execute XSS. P.S. is there any way to format comments? I'm trying to pretty print, but can't... –  Jul 22 '09 at 22:11