0

I'm migrating my existent database into another server. To achieve that I've exported and imported the database using phpMyAdmin SQL queries. Everything works fine, except that some UTF-8 characters appear broken in the website. I fetch them using the same PHP code (on a different server but with same PHP extensions and version).

Example of a string as I see it on the new website and on the databases (both old and new) (using phpMyAdmin): péri-prothétique

Example of a string as I see it in the old website péri-prothétique

As you can see, PHP used to automatically encode the characters the right way even thought the characters are mangled in the database, but doesn't do so anymore (not even if i explicitly utf8_encode or utf8_decode the result). I even tried forcing $mysqli->set_charset("UTF8") on every connection to no avail.

Both the web server, the database server,server connection, PHP and the tables use UTF-8 or utf8mb4 charset and collation, and are setup the same way as the old ones.

The only difference I see is that the new database server is MariaDB instead of MySQL and its webserver is nginx instead of Apache.

New database specs picture from phpMyAdmin:

IMAGE

Old database specs picture:

IMAGE

New webserver specs on which the website and PHP runs (same specs as old one but different server): Apache 2.4 PHP 7.0

How can I get back that old correct encoding? Why doesn't PHP automatically decode them right anymore?

UPDATE: Using mb_detect_encoding I see that PHP in both new and old version detects ASCII or UTF-8 on the query results, depending on whether there's at least an UTF-8 symbol or not. The issue is that on the new version PHP doesn't display the UTF-8 symbols right even thought it detects the string encoding as UTF-8.

UPDATE 2: thanks to this question I figured out why my entries were mangled: double encoding arose from the fact that the database collation was latin1_swedish_ci while the tables collation was utf8_general_ci. This doesn't answer the question thought since the old website was automatically "translating" those mangled characters, rendering them right in the HTML, and I want to replicate that behavior into the new website which is a different one but with the same code and php.ini settings.

Kitra
  • 13
  • 6
  • Don't use `utf8_encode`/`utf8_decode` that is for ISO encoded data. Aside from that change the code is the exact same? If you query the two DBs is the data the same before the PHP interaction? – user3783243 Jun 12 '18 at 14:30
  • Yes, queries with phpMyAdmin return the same data. I tried decode/encode just to see if something would happen, without real expectations – Kitra Jun 12 '18 at 15:25
  • MariaDB 10.0.32 is a little older than 5.6.39, but they should be "the same" with respect to this Question. – Rick James Jun 13 '18 at 17:41

3 Answers3

0

I think that you should check for your MariaDB configuration.

First check your php code in order to know if there isn't misleading typo ( but i think it doesn't)

Second, check for your MariaDB database/tables structure [extracted from here ]:

SELECT * FROM INFORMATION_SCHEMA.SCHEMATA;

Third, check for your MariaDB files config (my.cnf)[extracted from here ] :

    [client]
default-character-set = utf8mb4

[mysqld]
character-set-server = utf8mb4

Then restart your server :

mysql.server restart

Hope it will help you to fix your problem bro.

Bye

Snics
  • 1
  • 3
  • Thanks for the fast response! I modified the MariaDB files config (from utf8 to utf8mb4) but i can't restart the mariadb server from ssh (doesn't recognize "service" command). I'll try to restart the machine and hope it'll work – Kitra Jun 12 '18 at 15:17
  • I'm afraid it doesn't work :( I'm already sure 1st and 2nd points are not the issue (well, regarding the second point, tables use utf8 and not utf8mb4 but so do the old ones) – Kitra Jun 12 '18 at 15:30
  • Well.. That's too bad ! I'm sur you should search trought your config file or something like this.. What about SQL Insert data with special encoding character ? Does it work properly ? – Snics Jun 13 '18 at 13:48
  • after changing database collation to utf8 (which apparently conflicted with the tables collation, shouldn't the collation be cascading?, creating those mangled characters by double encoding) the inserts mantain utf8 compatible records in the database. The point is that the old database had those nasty chars but PHP automatically "translated" them in utf8 compatible one, and my question still remains: why doesn't it do so anymore? – Kitra Jun 13 '18 at 15:46
  • `SHOW VARIABLES LIKE 'char%';` and you need _client, _connection, and _results . – Rick James Jun 13 '18 at 17:42
  • @Kitra - `utf8` and `utf8mb4` are identical for western European characters. So, the change did not help. – Rick James Jun 13 '18 at 17:45
  • @Kitra - The database settings are just _defaults_ for newly created tables. The discrepancy is unimportant. – Rick James Jun 13 '18 at 17:46
  • @RickJames Regarding the `utf8mb4` change: I suspected that, but I literally tried everything else I could think of. Regarding database settings: I thought so too, but since changing it solved the double encoding problems on `INSERT` it turns out that the queries encoded in `latin1` the data due to database collation, only to reencode it to `utf8` due to table collation. Or at least this is the only plausible reason I can think of. – Kitra Jun 14 '18 at 07:28
  • Did you solve your issues ? Drop your DB then try to edit your my.cnf file with those line : **[client]** **default-character-set=utf8** **[mysqld]** **character-set-server=utf8** **collation-server=utf8_unicode_ci** **[mysql]** **default-character-set=utf8** Then check your tables/database collation using SQL request. As @RickJames said you should try to convert your tables collation : Convert to binary format then convert to UTF8 format if your collation aren't updated by previous step. – Snics Jun 14 '18 at 07:38
  • @Snics sadly changing charsets and collation didn't solve my issue. Regarding the conversion fix, I'm afraid to use that since it destroys properly encoded utf8 chars and I'm not sure I don't have any in my db. – Kitra Jun 14 '18 at 08:26
0

Are you expecting the changes on existing data? It will not work. You need to add the data again to see the changes. Remove all the data from the new database and add again.

  • What are you refering to? I didn't make any changes to the data, just copy/pasted via SQL queries from the old database – Kitra Jun 13 '18 at 12:27
  • Delete all the data from the new server database and restore it from your old server database backup. – Iynga Iyngaran Iyathurai Jun 13 '18 at 15:28
  • Of course I already did that multiple times, the problem still persists. – Kitra Jun 13 '18 at 15:36
  • Can't you restore from the terminal? Seems PHPMyAdmin is not using the UTF-8 connection. – Iynga Iyngaran Iyathurai Jun 13 '18 at 16:11
  • I don't think PHPMyAdmin is to blame. If you carefully read my question (I've reworded it to make it a little more clear) the data into the two databases is the same (both with mangled double-encoded chars). The problem seems that the `SELECT` results are not automatically decoded right anymore in either connection or PHP side. – Kitra Jun 14 '18 at 07:41
0

To check for double encoding, use SELECT HEX(col)... é should come back C3A9 (proper utf8), but instead shows C383C2A9 (double encoding).

See: Trouble with UTF-8 characters; what I see is not what I stored

If you have actually determined that you have double encoding, then the fix involves

UPDATE tbl SET col = CONVERT(BINARY(CONVERT(col USING latin1)) USING utf8mb4);

See http://mysql.rjweb.org/doc.php/charcoll#fixes_for_various_cases

Yes, "double encoding" is a silent bug -- two wrongs make a right (sort of).

Rick James
  • 135,179
  • 13
  • 127
  • 222
  • Thanks, I already saw your linked answer and it helped me a great deal in knowing what caused this problem. However I'm reclutant using the fix you proposed since it transforms properly encoded utf8 chars into `?` and I can't be sure i don't have any in my database short of manually controlling every record (which is of course unfeasible). Isn't there any other way which targets only `latin1` encoded chars leaving the utf8 ones alone? – Kitra Jun 14 '18 at 07:18
  • If separate _rows_ have separate issues, you will have to write a program to deduce the state of each row and apply different actions to different rows. I might be able to help provide expressions such as `CONVERT(...)` for individual cases. If you can identify which _rows_ need fixing via SQL, then do `UPDATE ... =CONVERT(...) WHERE ...` – Rick James Jun 16 '18 at 17:40
  • Also, carefully check the "Best Practice" in my first link. (It sounds like your many attempts at rebuilding are missing one of the items.) – Rick James Jun 16 '18 at 17:53