7

Questions:

What are the best safe1(), safe2(), safe3(), and safe4() functions to avoid XSS for UTF8 encoded pages? Is it also safe in all browsers (specifically IE6)?

<body><?php echo safe1($xss)?></body>

<body id="<?php echo safe2($xss)?>"></body>

<script type="text/javascript">
  var a = "<?php echo safe3($xss)?>";
</script>

<style type="text/css">
  .myclass {width:<?php echo safe4($xss)?>}
</style>

.

Many people say the absolute best that can be done is:

// safe1 & safe2
$s = htmlentities($s, ENT_QUOTES, "UTF-8");

// But how would you compare the above to:
//    https://github.com/shadowhand/purifier
// OR http://kohanaframework.org/3.0/guide/api/Security#xss_clean
// OR is there an even better if not perfect solution?

.

// safe3
$s = mb_convert_encoding($s, "UTF-8", "UTF-8");
$s = htmlentities($s, ENT_QUOTES, "UTF-8");

// How would you compare this to using using mysql_real_escape_string($s)?
// (Yes, I know this is a DB function)
// Some other people also recommend calling json_encode() before passing to htmlentities
// What's the best solution?

.

There are a hell of a lot of posts about PHP and XSS. Most just say "use HTMLPurifier" or "use htmlspecialchars", or are wrong. Others say use OWASP -- but it is EXTREMELY slow. Some of the good posts I came across are listed below:

Do htmlspecialchars and mysql_real_escape_string keep my PHP code safe from injection?

XSS Me Warnings - real XSS issues?

CodeIgniter - why use xss_clean

Community
  • 1
  • 1
user324289
  • 175
  • 2
  • 7
  • Non-escaped chars is not the only thing you should worry about. In your `var a = "";` you also need to strip out all new-line characters. – zerkms May 13 '11 at 03:36

2 Answers2

4

safe2() is clearly htmlspecialchars()

In place of safe1() you should really be using HTMLPurifier to sanitize complete blobs of HTML. It strips unwanted attributes, tags and in particular anything javascriptish. Yes, it's slow, but it covers all the small edge cases (even for older IE versions) which allow for safe HTML user snippet reuse. But check out http://htmlpurifier.org/comparison for alternatives. -- If you really only want to display raw user text there (no filtered html), then htmlspecialchars(strip_tags($src)) would actually work fine.

safe3() screams regular expression. Here you can really only apply a whitelist to whatever you actually want:

var a = "<?php echo preg_replace('/[^-\w\d .,]/', "", $xss)?>";

You can of course use json_encode here to get a perfectly valid JS syntax and variable. But then you've just delayed the exploitability of that string into your JS code, where you then have to babysit it.


Is it also safe in all browsers (specifically IE6)?

If you specify the charset explicitly, then IE won't do its awful content detection magic, so UTF7 exploits can be ignored.

alex
  • 479,566
  • 201
  • 878
  • 984
mario
  • 144,265
  • 20
  • 237
  • 291
  • For safe2: why is htmlspecialchars($s, ENT_QUOTES, "UTF-8") better than htmlentities($s, ENT_QUOTES, "UTF-8")? Are they equivalent but the former is just faster? I've heard that the latter is better as it protects against foreign character XSS attacks. Can you also expand on what you mean by 'but then you've just delayed the exploitability of that string in yoru JS Code' by giving me an example? Thanks! – user324289 May 13 '11 at 13:17
  • No, they're mostly the same. `htmlentities` also encodes some other characters. But since pretend-XHTML is still widespread you should prefer just `htmlspecialchars` which only uses XML escapes, not the potentially invalid HTML entities. Foreign characters are not so much an issue for XSS exploits. More severe are unquoted attributes, because not only `"` and `'` are problematic there, but also `@ ! %` and others can become terminators. – mario May 13 '11 at 13:43
  • Regarding JS, if your variable contains `var a = "text'> – mario May 13 '11 at 13:44
  • Regarding your suggestion to use HTMLPurifier for safe1(): can you point me to an example where htmlentities() does NOT work? I don't really care about users being able to mangle the website layout or having strict compliance to XHTML, etc. etc. -- only that arbitrary Javascript is prevented from running. As a result, HTMLPurifier seems too slow for my needs. – user324289 May 13 '11 at 18:20
  • Also, one more question: what function would be appropriate for safe4 above? (I edited the original post) – user324289 May 13 '11 at 18:33
  • HTMLPurifier contains a CSS cleaner, not sure if it's suitable here. (There might be separate classes.) I would use a restrictive regex again, but that won't deal with IEs CSS exploits unless you disallow parens. – mario May 13 '11 at 19:08
  • Thanks mario -- not sure if you also saw my previous question above regarding an example where htmlentities does not work...thx – user324289 May 13 '11 at 19:36
  • No, `htmlentities` alone works fine - as long as you use it in a text area or always only within *quoted* attributes. – mario May 13 '11 at 19:46
  • Why are you advising regex for safe3? Isn't `json_encode((string)$stuff);` better? Regex won't do entity conversion anyway... – Christian Oct 23 '11 at 22:29
  • 1
    Also, for safe4; it depends on context. In general a regex against `AZaz09 "'#` should work great **(consider that `;` and `:` are the most dangerous characters here)**. Also note that some font names require strange characters. But, if you're doing this for simple CSS keywords (`solid` or `red`) or measurements (`2px` or `5em 9px`), it should be enough. – Christian Oct 23 '11 at 22:31
3

http://php.net/htmlentities note the section on the optional third parameter that takes a character encoding. You should use this instead of mv_convert_encoding. So long as the php file itself is saved with a utf8 encoding that should work.

htmlentities($s, ENT_COMPAT, 'UTF-8');

As for injecting the variable directly into javascript, you might consider putting the content into a hidden html element somewhere else in the page instead and pulling the content out of the dom when you need it.

The purifiers that you mention are used when you want to actually display html that a user submitted (as in, allow the browser to actually render). Using htmlentities will encode everything such that the characters will be displayed in the ui, but none of the actual code will be interpreted by the browser. Which are you aiming to do?

Louis Huppenbauer
  • 3,719
  • 1
  • 18
  • 24
wewals
  • 1,447
  • 9
  • 9