31

I have seen a lot of conflicting answers about this. Many people love to quote that php functions alone will not protect you from xss.

What XSS exactly can make it through htmlspecialchars and what can make it through htmlentities?

I understand the difference between the functions but not the different levels of xss protection you are left with. Could anyone explain?

stuckinphp
  • 341
  • 1
  • 3
  • 4
  • See http://stackoverflow.com/questions/1891392/is-htmlentities-bullet-proof regarding htmlentities also see: http://stackoverflow.com/questions/71328/what-are-the-best-practices-for-avoid-xss-attacks-in-a-php-site for more information on the subject. You can also look to the right hand side of this page under "Related" for more relevant / similar topics. – Jim Sep 02 '10 at 01:38

3 Answers3

14

htmlspecialchars() will NOT protect you against UTF-7 XSS exploits, that still plague Internet Explorer, even in IE 9: http://securethoughts.com/2009/05/exploiting-ie8-utf-7-xss-vulnerability-using-local-redirection/

For instance:

<?php
$_GET['password'] = 'asdf&ddddd"fancy˝quotes˝';

echo htmlspecialchars($_GET['password'], ENT_COMPAT | ENT_HTML401, 'UTF-8') . "\n";
// Output: asdf&amp;ddddd&quot;fancyË

echo htmlentities($_GET['password'], ENT_COMPAT | ENT_HTML401, 'UTF-8') . "\n";
// Output: asdf&amp;ddddd&quot;fancy&Euml;quotes

You should always use htmlentities and very rarely use htmlspecialchars when sanitizing user input. ALso, you should always strip tags before. And for really important and secure sites, you should NEVER trust strip_tags(). Use HTMLPurifier for PHP.

Theodore R. Smith
  • 21,848
  • 12
  • 65
  • 91
  • Thanks for the link explains a lot more. htmlspecialchars used to be fine when the charset could be assured. now that it can't the only option is htmlentities. – stuckinphp Sep 02 '10 at 02:15
  • 11
    If page declares encoding (and it always should!), then there is no risk of UTF-7 exploit. I strongly disagree about use of `strip_tags`. It malforms some legitimate input (`2 < 4`, ``) but doesn't improve security of `htmlspecialchars `-escaped data. – Kornel Dec 12 '10 at 00:48
  • You didn't read the IE 9 bug report. It plainly states that all IE versions are prone to it even with the charset specifically set via either HTTP headers or in the HTML. – Theodore R. Smith Mar 19 '11 at 02:01
  • 9
    If your page/browser is vulnerable to the UTF-7 issue, `htmlentities` isn't going to help you any more than `htmlspecialchars` will. Both of them will interpet the UTF-7 encodings of `<` and `>` as just "safe" ASCII chars and pass them through. Same problem with `strip_tags`. – John Flatness May 07 '11 at 00:27
  • 8
    @TheodoreR.Smith: The solution is to always pass the character set parameter to `htmlspecialchars`, and set the proper character set on the page. If the browser ignores the proper character set, there's not much you can do... – ircmaxell Feb 23 '12 at 21:59
  • 6
    @NikiC OK. Please post your alternative answer. – Theodore R. Smith Mar 02 '12 at 19:11
  • @NikC Let it be stated for the record that you seem to just be a troll ;-) – Theodore R. Smith Apr 11 '12 at 21:40
  • 2
    @TheodoreR.Smith If the charset is English, `htmlspecialchars` and `htmlentities` should obtain the same result. But if some characters is Chinese, using `htmlentities` would be a nightmare。 – alwaysday1 May 21 '13 at 09:44
  • 1
    Version of that link that works: http://web.archive.org/web/20130326052725/http://securethoughts.com/2009/05/exploiting-ie8-utf-7-xss-vulnerability-using-local-redirection/ – Hut8 Feb 07 '14 at 21:21
  • 3
    Also note, your examples are incorrect: [Proof](http://3v4l.org/XYedd). Your exact code produces identical results on all versions of PHP (all the way back to 4.3, though with notices for the lack of ENT_HTML401). – ircmaxell Jul 21 '14 at 20:24
6

If PHP's header command is used to set the charset

header('Content-Type: text/html; charset=utf-8');

then htmlspecialchars and htmlentities should both be safe for output of HTML because XSS cannot then be achieved using UTF-7 encodings.

Please note that these functions should not be used for output of values into JavaScript or CSS, because it would be possible to enter characters that enable the JavaScript or CSS to be escaped and put your site at risk. Please see the XSS Prevention Cheat Sheet on how to appropriately handle these situations.

SherylHohman
  • 16,580
  • 17
  • 88
  • 94
SilverlightFox
  • 32,436
  • 11
  • 76
  • 145
2

I'm not sure if you have found the answer you were looking for, but, I am also looking for an HTML cleaner. I have an application I am building and want to be able to take HTML code, possibly even Javascript, or other languages and put them into a MySQL DB without causing issues nor allowing for XSS issues. I've found HTML Purifier and it appears to be the most developed and still maintained tool for cleaning up user submitted information on a PHP system. The page linked is their compairison page which can yield reasoning as to why their's or another tool could be useful. Hope this helps!

Community
  • 1
  • 1