21

If the following statements are true,

  • All documents are served with the HTTP header Content-Type: text/html; charset=UTF-8.
  • All HTML attributes are enclosed in either single or double quotes.
  • There are no <script> tags in the document.

are there any cases where htmlspecialchars($input, ENT_QUOTES, 'UTF-8') (converting &, ", ', <, > to the corresponding named HTML entities) is not enough to protect against cross-site scripting when generating HTML on a web server?

Alf Eaton
  • 5,226
  • 4
  • 45
  • 50
  • Show some back-end at all. If all of the printing is just like `echo htmlspecialchars($input, ENT_QUOTES, 'UTF-8')` then maybe it's enough... if you are putting the input in functions like `eval()` you might have other security risks – Royal Bg Oct 25 '13 at 07:59
  • Yes, this is only about HTML output. `eval` of untrusted input is also dangerous, but outside the scope of this question. – Alf Eaton Oct 25 '13 at 15:29

3 Answers3

21

htmlspecialchars() is enough to prevent document-creation-time HTML injection with the limitations you state (ie no injection into tag content/unquoted attribute).

However there are other kinds of injection that can lead to XSS and:

There are no <script> tags in the document.

this condition doesn't cover all cases of JS injection. You might for example have an event handler attribute (requires JS-escaping inside HTML-escaping):

<div onmouseover="alert('<?php echo htmlspecialchars($xss) ?>')"> // bad!

or, even worse, a javascript: link (requires JS-escaping inside URL-escaping inside HTML-escaping):

<a href="javascript:alert('<?php echo htmlspecialchars($xss) ?>')"> // bad!

It is usually best to avoid these constructs anyway, but especially when templating. Writing <?php echo htmlspecialchars(urlencode(json_encode($something))) ?> is quite tedious.

And... injection issues can happen on the client-side as well (DOM XSS); htmlspecialchars() won't protect you against a piece of JavaScript writing to innerHTML (commonly .html() in poor jQuery scripts) without explicit escaping.

And... XSS has a wider range of causes than just injections. Other common causes are:

  • allowing the user to create links, without checking for known-good URL schemes (javascript: is the most well-known harmful scheme but there are more)

  • deliberately allowing the user to create markup, either directly or through light-markup schemes (like bbcode which is invariably exploitable)

  • allowing the user to upload files (which can through various means be reinterpreted as HTML or XML)

Ilmari Karonen
  • 49,047
  • 9
  • 93
  • 153
bobince
  • 528,062
  • 107
  • 651
  • 834
  • 1
    What's an example value of `$xss` that would cause `
    ` to be dangerous?
    – Alf Eaton Oct 25 '13 at 15:30
  • There used to be a response from @bobince here, but it's disappeared - it was `');your_code_here();//` https://web.archive.org/web/20170203212138/https://stackoverflow.com/questions/19584189/when-used-correctly-is-htmlspecialchars-sufficient-for-protection-against-all-x#answer-19587643 – Alf Eaton Apr 14 '20 at 12:27
2

Assuming you are not using older PHP versions (5.2 or so), the htmlspecialchars is "safe" (and off course taking the backend code into consideration as @Royal Bg mentions)

In older PHP versions malformed UTF-8 characters made this function vulnerable

My 2 cents: just always sanitize/check your inputs by telling what is allowed, instead of just escaping everything/encoding everything

i.e. if someone must enter a telephone number, i can imagine the following characters are allowed: 0123456789()+-. and a space, but all others are just ignored / stripped out

Same would apply to addresses etc. someone specifying UTF-8 characters for dots/blocks/hearts etc. in their address must be mentally ill...

Machavity
  • 30,841
  • 27
  • 92
  • 100
Ronald Swets
  • 1,669
  • 10
  • 16
  • 1
    And what if you have a, say, forum which explicitly allows any sort of freeform text? – deceze Oct 25 '13 at 09:21
  • That's why i mentioned "instead of just escaping"... off course freeform text is very valid. But what would you do with the free form text? I cannot imagine you just output the freeform text on a page... and if you do, it would be a "trusted" source which you allowed to do so. – Ronald Swets Oct 25 '13 at 10:13
  • 6
    I simply *escape* freeform text. Validation is fine to ensure you get valid values. Always apply proper escaping to everything and you are fine. *Sanitisation*, or the *reformatting* and *alteration* of data, is only really useful to bring varied input into a standard form, say to strip all non-numbers from telephone numbers and format the number into your favourite standardised format. Stripping random characters from random text is typically not helpful at all for freeform text. – deceze Oct 25 '13 at 10:19
  • Then we both agree, stripping is off course only sane if the context is know. Also even if the context is know (i.e. "fill in your favorite javascript code here") escaping is a must – Ronald Swets Oct 25 '13 at 14:31
-6

As far as i know, yes. I cant imagine a case where it doesnt avoid xss. If you want to be completely safe, use strip_tags()

Realitätsverlust
  • 3,941
  • 2
  • 22
  • 46