16

I am getting this error in my local site.

Warning (2): htmlspecialchars(): Invalid multibyte sequence in argument in [/var/www/html/cake/basics.php, line 207]

Does anyone knows, what is the problem or what should be the solution for this?

Thanks.

gautamlakum
  • 11,815
  • 23
  • 67
  • 90

6 Answers6

15

Be sure to specify the encoding to UTF-8 if your files are encoded as such:

htmlspecialchars($str, ENT_COMPAT, 'UTF-8');

The default charset for htmlspecialchars is ISO-8859-1 (as of PHP v5.4 the default charset was turned to 'UTF-8'), which might explain why things go haywire when it meets multibyte characters.

Sorcy
  • 2,587
  • 5
  • 26
  • 34
Tatu Ulmanen
  • 123,288
  • 34
  • 187
  • 185
  • Line 207 is here. $charset = 'UTF-8'; htmlspecialchars($text, ENT_QUOTES, $charset); // Line 207 – gautamlakum Sep 27 '10 at 13:06
  • 1
    For me, this problem ended up being the reverse, that my data's characterset was actually 'ISO-8859-1' when I was trying to encode it as 'UTF-8' in htmlspecialchars. I switched the charset argument to 'ISO-8859-1' and that resolved the problem. At least, until I can fully update everything to 'UTF-8'. – Kzqai Nov 06 '12 at 17:37
  • 6
    Starting from PHP 5.4.0, the default value of the 3rd parameter of `htmlspecialchars()` is `'UTF-8'` - this answer should be updated. – Walter Tross Mar 22 '13 at 12:03
5

I ran in to this error on production and found this great post about it -

http://insomanic.me.uk/post/191397106/php-htmlspecialchars-htmlentities-invalid

It appears to be a bug in PHP (for CentOS at least) that displays this error on when display errors is Off!

gingerCodeNinja
  • 1,239
  • 1
  • 12
  • 27
4

You are feeding corrupted character data into the function, or not specifying the right encoding.

I had this issue a while ago, old behavior (prior to PHP 5.2.7 I believe) was to return the string despite corruption, but since that version it will throw this error instead.

My solution involved writing a script to feed my strings through iconv using the //IGNORE modifier to remove corrupted data.

(We had a corrupted database which had some strings in UTF-8, some in latin-1 usually with incorrectly defined character types on the columns).

(Looking at the comment to Tatu's answer, I would start by looking at (and playing with) the contents of the $charset variable.

berty
  • 319
  • 3
  • 13
  • I agree. I've passed user data through iconv or mb_convert_encoding(), with the 'from' and 'to' charsets the same. There's usually an option to strip invalid characters. – Jeff Standen Sep 28 '10 at 03:58
  • Corrupted data here as well, mb_convert_encoding($var, 'UTF-8') did the job. – Jonah Braun Jul 25 '12 at 02:59
1

The correct code in order not to get any error is:

htmlentities($string, ENT_IGNORE, 'UTF-8') ;

Beside this you can also use str_replace to replace some bad characters to your needs and then use htmlentities function.

Have a look at this rss feed it replaced the greater html sign to gt; tag which might not look nice when reading thee rss feed. You can replace this with something like "-" sign or ")" and etc.

Sailab Rahi
  • 581
  • 1
  • 8
  • 11
1

Had the same problem because I was using substr on utf-8 string.
Error was infrequent and seemingly random. Error occurred only if string was cut on multibyte char!

mb_substr solved the problem :)

CoR
  • 3,826
  • 5
  • 35
  • 42
0

That's actually one of the most frequent errors I get.

Sometimes I dont use __() translation - just plain German text containing äöü. There it is especially important to mind the encoding of the files.

So make sure you properly save the files that contain special chars as UTF8.

mark
  • 21,691
  • 3
  • 49
  • 71