0

This question tells me

htmlentities is identical to htmlspecialchars() in all ways, except with htmlentities(), all characters which have HTML character entity equivalents are translated into these entities.

Sounds like htmlentities is the one I want.

Then this question tells me I need the "UTF-8" argument to get rid of this error:

Invalid multibyte sequence in argument

So, here is my encoding wrapper function (to normalise behaviour across different PHP versions)

function html_entities ($s)
{
    return htmlentities ($s, ENT_COMPAT /* ENT_HTML401 */, "UTF-8");
}

I am still getting the "multibyte sequence in argument" error.

Here is a sample string which triggers the error, and it's hex encoding:

Jigue à Baptiste

4a 69 67 75 65 20 e0 20 - 42 61 70 74 69 73 74 65

I notice that the à is encoded as 0xe0 but as a single byte which is above 0x80.

What am I doing wrong?

Community
  • 1
  • 1
spraff
  • 32,570
  • 22
  • 121
  • 229
  • I love those *"What am I doing wrong?"* questions, because the answer is so simple: *Something!* – hakre Jun 28 '12 at 12:56
  • Also you're not doing anything wrong. You pass a string into a function that refuses it to process it and you get an error message about that. That's not wrong but right, because the string you pass in there is not correct. It would be really wrong, if you would actually get your undefined but expected result. So what are you doing wrong? *"You have the wrong expectations, that's all."* So what did you expect that function to do and why? – hakre Jun 28 '12 at 12:59

2 Answers2

2

Your string is encoded in ISO-8859-1, not UTF-8. Plain and simple.

function html_entities ($s)
{
    return htmlentities ($s, ENT_COMPAT /* ENT_HTML401 */, "ISO-8859-1");
                                                           ^^^^^^^^^^^^
}
hakre
  • 193,403
  • 52
  • 435
  • 836
deceze
  • 510,633
  • 85
  • 743
  • 889
1

If à is encoded as 0xE0 then you didn't save the file in UTF-8 encoding. 0xE0 is invalid UTF-8. It should be 0xC3 0xA0

Save your file in UTF-8 encoding. Also see UTF-8 all the way through

If you saved it correctly in utf-8, the hex should look like so:

4A 69 67 75 65 20 C3 A0 20 42 61 70 64 69 73 74 65
J  i  g  u  e     à        B  a  p  t  i  s  t  e
Community
  • 1
  • 1
Esailija
  • 138,174
  • 23
  • 272
  • 326