HTML entity encoding is okay for untrusted data that you put in the body of the HTML document, such as inside a tag. It even sort of works for untrusted data that goes into attributes, particularly if you're religious about using quotes around your attributes. But HTML entity encoding doesn't work if you're putting untrusted data inside a tag anywhere, or an event handler attribute like onmouseover, or inside CSS, or in a URL. So even if you use an HTML entity encoding method everywhere, you are still most likely vulnerable to XSS. You MUST use the escape syntax for the part of the HTML document you're putting untrusted data into. That's what the rules below are all about.
More info in OWASP.
The correct way to use htmlspecialchars
is something like this:
echo htmlspecialchars($string, ENT_QUOTES, 'UTF-8');
Also, have in mind that a user could send a username like "Jim onclick=alert('hi')"
If you don't wrap in quotes the value attribute, you'd get something like:
<input type="text" name="username" value=Jim onclick=alert('hi')>
ALWAYS use quotes around attributes. Even if they aren't user-inputted, it's a good habit to get into.
<input type="text" name="username" value="<?php echo htmlspecialchars($_POST['username'], ENT_QUOTES, 'UTF-8'); ?>">
Having these things in mind, you should be covered for most of the cases. However, if you want to be really picky, do read the OWASP document I mentioned before, it's really helpful.
UPDATE
There seems to be some controversy about htmlspecialchars
vs htmlentities
. I'm going to sum up a few things I've been reading and you can choose whatever of the two:
UTF-7 problem
Both htmlspecialchars
and htmlentities
are subceptible to the infamous UTF-7 problem. None of them support this encoding. As you can read in some of the comments of the SO posts provided at the bottom of the post:
If your page/browser is vulnerable to the UTF-7 issue, htmlentities
isn't going to help you any more than htmlspecialchars
will. Both of
them will interpet the UTF-7 encodings of < and > as just "safe" ASCII
chars and pass them through.
Solution: Don't use UTF-7 and also make sure that escaping is done using the same character encoding that the document is being served as to avoid disappearing quotes: establish in the header of your webpage the same encoding as the one you'll use in htmlspecialchars
(UTF-8 for instance):
header('Content-Type: text/html; charset=utf-8');
htmlspecialchars
will default to UTF-8 (in PHP 5.4/5.5) if you don't specify the third parameter so you should be safe even if you forgot to establish it.
Check this interesting article talking about the topic (and some more useful info about XSS). LINK
htmlentities() vs. htmlspecialchars()
htmlspecialchars
- Use it when there is no need to encode all characters which have their HTML equivalents, it's better to use htmlspecialchars due to the fact that sends less code to the client.
This isn't a matter to be taken lightly: less code sent, faster web pages. Code is also more readable than the one produced by htmlentities.
- Sometimes you're writing XML data, and you can't use HTML entities in a XML file.
htmlentities
- When there is a need to encode all characters. If your pages use encodings such as ASCII or LATIN-1 instead of UTF-8.
Check the documentation I provided and this SO questions:
htmlentities() vs. htmlspecialchars()
htmlspecialchars vs htmlentities when concerned with XSS
and choose the one that suits you best.