3

I want to sanitize a simple text field with a person's name, to protect from XSS and such. Stackoverflow pretty much says I must whitelist. I don't understand this. If I simply remove all < and > from the input value, or replace them with &gt; and &ls;, does not that rule out code injection? Or am I missing something? Perhaps you only need to whitelist in more complex scenarios where you have to put up with angular brackets?

Sorry if it's a silly question, it's important to get this right.

Irina Rapoport
  • 1,404
  • 1
  • 20
  • 37
  • 1
    `<` and `>`, not ``, ``, but not ` – Colonel Thirty Two May 18 '15 at 02:29
  • Oops, corrected, thank you! – Irina Rapoport May 18 '15 at 02:31
  • 1
    Depending on how you choose to use the name, you may need to replace quotes and/or apostrophes as well (for example, if you were to use the name as the value of an `input` element). They can be replaced with " and &. You should also replace the ampersand with &. If your language has a method for escaping HTML, use it (for example [HttpUtility.HtmlEncode](https://msdn.microsoft.com/en-us/library/system.web.httputility.htmlencode%28v=vs.110%29.aspx) in .NET or [htmlspecialchars](http://php.net/htmlspecialchars) in PHP). – Lithis May 18 '15 at 02:56
  • I was going to ask about that. What are the methods for escaping HTML in Java and Javascript? – Irina Rapoport May 18 '15 at 03:06
  • 1
    I don’t know of any built-in methods in Java or JavaScript, but there are ways. For Java, see [Recommended method for escaping HTML in Java](http://stackoverflow.com/questions/1265282/recommended-method-for-escaping-html-in-java), and for JavaScript, see [HTML-encoding in JavaScript/jQuery](http://stackoverflow.com/questions/1219860/html-encoding-in-javascript-jquery). – Lithis May 18 '15 at 03:10

2 Answers2

2

Whether to whitelist or encode depends on how you want to use the text.

If you intend to treat the input as plain text, then encoding special characters is enough, and any HTML code entered will display as text only as long as you are careful not to allow unencoded text to end up anywhere in your HTML output. (This includes making sure any other systems you interface with don’t inappropriately use the unencoded text.)

If you want to allow some markup in the input, such as text styling or links, then you must whitelist the tags that you allow and get rid of all others.

Lithis
  • 1,327
  • 8
  • 14
1

No, it's not sufficient because if you were to include the person's name in an html attribute, you would also need to escape any double-quotes contained therein.

Bryce
  • 11
  • 1