0

I run a personal website that always showed accented chars correctly. Now, suddenly, it doesn't any longer. The funny thing is, even its localhost version doesn't.

The application is unaltered over years in this regard and here it is what it does, in the given order:

  1. mysql database set to collation utf8_general_ci

  2. Application sends these two queries: "SET NAMES 'utf8' COLLATE 'utf8'" and "SET CHARACTER_SET 'utf8'"

  3. Php headers send the following headers before anything is printed: header('Content-type: text/xml; charset=utf-8'."\r\n"); header('Content-transfer-encoding: utf-8'."\r\n");

  4. Each web page shows a meta tag as follows: <meta http-equiv="content-type" content="text/html; charset=utf-8" />

Yet, now, suddenly, chars are shown all wrong. If I replace manually the chars, they are shwon as intended. But I cannot fathom if or what may have "corrupted" the database then. And certainly I cannot fix manually hundreds of posts.

Any idea why this strange thing suddenly happens and suggestions about how to fix it?

Instance of a wrong line: "Non ho mai avuto l' opportunità di incontrarti di persona. Non so se è perchè non ho cercato abbastanza l' occasione o perchè" etc...

alberto
  • 1
  • 2
  • So the App did not change! But did something else change? Apache/MYSQL or anything else you can think of – RiggsFolly Aug 15 '16 at 16:02
  • use utf8_unicode_ci instead of general (http://stackoverflow.com/questions/766809/whats-the-difference-between-utf8-general-ci-and-utf8-unicode-ci/766996#766996) – Snowman Aug 15 '16 at 16:03
  • on my part absolutely nothing changed. The codes are the same because they worked correctly ove the years. Also the remote database appears still set to utf8_general_ci. If the hosting server updated something, I don't know. But it's register.com and I doubt there they would make updates that can affect charsets, they serve a pool of clients from way too many countries to afford an error in charset handling. – alberto Aug 15 '16 at 16:07
  • Have you tried a different browser? You can force any browser to interpret all pages as Latin-1 or whatnot, which could cause the phenomenon. – YetiCGN Aug 15 '16 at 16:13
  • same thing on Firefox, Opera, Chrome. I usually solve things by myself but this one is really puzzling me. – alberto Aug 15 '16 at 16:15
  • if you want to check the behaviour yourself here is a link (Italian text so it carries accented) http://www.fullposter.com/snippets.php?snippet=485#topic – alberto Aug 15 '16 at 16:17
  • if pdo ... do you provide the charset in your connection string? also i would check if the database is corrupted (phpMyAdmin or something) or if the script actually screws everything up – Jakumi Aug 15 '16 at 16:20
  • all the codes sent are exactly in the fashion and order provided in my post. Not sure what pdo stands for, at any rate it all worked smoothly in the past years. Now, suddenly, pam. No idea why. – alberto Aug 15 '16 at 16:23
  • if it may help i just edited the link page and placed ONE corrected à accent in the line "Non ho mai avuto l' opportunità di incontrarti di persona. Non so se è perchè non ho cercato abbastanza l' occasione o perchè" which NOW reads Non ho mai avuto l' opportunità di incontrarti di persona. Non so se è perchè non ho cercato abbastanza l' occasione o perchè so in short it seems that it does read accented chars as intended if supplied anew, but for some reason the DB has corrupted all the accenteed there. Any idea how to fix this without having to retype manually hundreds of posts? – alberto Aug 15 '16 at 16:30
  • I assume that the script itself is probably okay, and the database is probably too. PDO is one way in php to connect to a database (mysqli would be a different, mysql is a third broken one). PDO connects via a connectionstring in the form `mysql:host;dbname=database` (or something like that) and I noticed that all the utf8 settings don't seem to work, if you don't add `;charset=utf8` to the connectionstring (see http://php.net/manual/en/ref.pdo-mysql.connection.php) and apparently some libraries were changed – Jakumi Aug 15 '16 at 16:37
  • well wait a minute one thing changed. Since they deprecated mysql functions i moved weeks ago to mysqli (in short i just replaced all mysql_something() in my codes with mysqli_something() including in the arguments, when needed, the connection parameter. Obviously the codes are right or they would exit on errors. But then, how could a move to mysqli corrupt a database? It affects only outputs not older DB contents. I don't know what this pdo thing is i will try to check online. Any ideas are welcome of course at the moment it's still a riddle to me. – alberto Aug 15 '16 at 16:43
  • i connect using mysqli_connect('hosthere', 'userhere', 'passwordhere'); mysqli_select_db($CONNECTION, 'databasehere'); mysqli_set_charset($CONNECTION, 'utf8'); – alberto Aug 15 '16 at 16:46
  • wt... bingo! mysqli_set_charset($‌​CONNECTION, 'utf8'); must be commented out! – alberto Aug 15 '16 at 16:47
  • Jakumi you nailed it. Thanks so much. You put me on the right path, it was a mysqli thing ! You guys rock :) – alberto Aug 15 '16 at 16:49
  • sounds weird though. ;o/ – Jakumi Aug 15 '16 at 16:49
  • indeed. But at any rate you put me on the right path. Weird I agree. With mysql_set_charset not a peep! – alberto Aug 15 '16 at 16:52
  • ok for future reference i can confirm. I just tested that mysql_set_charset('utf8'); does NOT cause issues in the given setting. BUT mysqli_set_charset($‌​‌​CONNECTION, 'utf8'); in the very SAME setting DOES. Apparently if we migrate from the deprecated mysql to mysqli the latter may garble utf8 if after mysqli_set_charset further statements issue again utf8 commands such as "SET NAMES 'utf8' COLLATE 'utf8'" and "SET CHARACTER_SET 'utf8'" and header('Content-type: text/xml; charset=utf-8'."\r\n"); header('Content-transfer-encoding: utf-8'."\r\n"); So, beware when migrating to mysqli ! – alberto Aug 15 '16 at 17:04
  • `utf8` is not a valid collation. – Rick James Aug 15 '16 at 23:56
  • 1
    @Snowman - those two collations only differ in multi-character (not multi-byte) utf8 sequences. – Rick James Aug 15 '16 at 23:59

1 Answers1

1

è is Mojibake for è. Were you expecting a grave-e? Regardless of what changed or did not change, let's look at fixing it.

See this and look for Mojibake. It says to check/fix these:

  • The bytes to be stored need to be UTF-8-encoded. Fix this.
  • The connection when INSERTing and SELECTing text needs to specify utf8 or utf8mb4. Fix this.
  • The column needs to be declared CHARACTER SET utf8 (or utf8mb4). Fix this.
  • HTML should start with .

Also see the technique for checking the HEX of what is stored for è:

  • utf8 hex is C3A8.
  • Hex C383C2A8 means you have "double encoding"; that will lead to other issues.
  • E8 is the latin1 hex -- I doubt if you will see this.
Community
  • 1
  • 1
Rick James
  • 135,179
  • 13
  • 127
  • 222
  • the issue was solved thank you, apparently mysql_set_charset('utf8') did not cause any trouble but migrating to mysqli after the deprecation of mysql_* using mysqli_set_charset('utf8') did cause the issue. – alberto Aug 27 '16 at 08:32
  • 1
    @alberto Hmmm... That is strange. `mysqli_set_charset` should have done the same thing. _But_ that is the wrong syntax; instead: `$link->set_charset('utf8')` _or_ `mysqli_set_charset($link, 'utf8')`. – Rick James Aug 27 '16 at 19:14
  • i know. I can assure you that the application was unmodified for years and as long as mysql_set_charset was in place, no problem. As soon as I migrated to mysqli and so used also mysqli_set_charset (and no notices or errors were thrown) , pam garbled utf8. I even tried to turn that single statement back into mysql_set_charset and utf8 was no longer garbled. Funny, agreed. – alberto Aug 28 '16 at 14:47
  • ps the syntax was correct the first argument was the connection link I just forgot to put that in the comment here. If you check previous comments you will see it was in place. – alberto Aug 28 '16 at 14:50
  • Was MySQL upgraded meanwhile? Some defaults have changed. – Rick James Aug 28 '16 at 16:27
  • Did you change _all_ calls from mysql_* to mysqli*_? – Rick James Aug 28 '16 at 16:27