2

When I use htmlentities() function in my PHP projects, should I always use it with a flag so it is compatible with other language characters?

$my_variable = $some_data;
$output_variable = htmlentities($my_variable);

or...

$my_variable = $some_data;
$output_variable = htmlentities($my_variable, ENT_COMPAT, 'UTF-8');

If neither of the above, what is the proper way to use this function.

Thank you!

usnidorg
  • 21
  • 1
  • 1
    That depends: Is the data in `$some_data` *actually* encoded with UTF-8? – Gumbo Jun 26 '11 at 21:13
  • @Gumbo, The data comes from a MySQL database. Using htmlentities() works fine for me but the output looks incorrect in some other languages. – usnidorg Jun 26 '11 at 21:26

2 Answers2

1

Generally speaking, you shouldn't use it at all. Specifying the encoding used in the Content-Type HTTP header and then using real characters instead of entities is generally more efficient. (OTOH, you should use htmlspecialchars to convert characters which have special meaning in HTML to entities).

If you do use it, then you need to specify what encoding you are converting from if you aren't using the default (ISO-8859-1). Specifying UTF-8 when you aren't using UTF-8 is less than helpful.

Quentin
  • 914,110
  • 126
  • 1,211
  • 1,335
  • When I display user provided data from the database, I need to filter it to prevent execution f malicious code. I was told here on stackoverflow to use htmlentities() when outputting data. Is this not correct? Should I use htmlspecialchars instead? – usnidorg Jun 26 '11 at 21:25
  • `htmlspecialchars` is sufficient, it handles the characters which have special meaning in HTML. – Quentin Jun 26 '11 at 21:28
  • @Quentine, I found this: htmlspecialchars() will NOT protect you against UTF-7 XSS exploits, that still plague Internet Explorer, even in IE 9. http://stackoverflow.com/questions/3623236/htmlspecialchars-vs-htmlentities-when-concerned-with-xss . I guess this is why I was using htmlentities. This is really confusing. – usnidorg Jun 26 '11 at 21:34
0

If you're unsure what charset you're using, the second way it the better. However, for most occasions, you'll find that the first one is fine.

James Long
  • 4,629
  • 1
  • 20
  • 30