0

My MySQL database contains some Chinese symbols and such (non-ASCII symbols). When I view them in PHPMyAdmin, they look garbled. However, if I display them on my website with PHP using the regular mysqli API, it looks fine so I assume the data is uploaded/stored properly in the database, so maybe the server connection collation is incorrect.

My PHP code for opening the database connection is:

function openConnection(): mysqli
{
    $databaseHost = "localhost";
    $databaseUser = "root";
    $databasePassword = '';
    $databaseName = "my-database-name";

    $connection = new mysqli($databaseHost, $databaseUser,
        $databasePassword, $databaseName);

    if ($connection->connect_error) {
        die("Connection failed: " . $connection->connect_error);
    }

    return $connection;
}

My PHPMyAdmin server connection collation is the default utf8mb4_unicode_ci which seems to be reasonable as well. My tables are also created with the default utf8mb4_general_ci. Shouldn't that work fine for any input users might make?

Calling $connection->get_charset() in PHP also returns the correct charset:

If I export the database data in MyPHPAdmin, the export is also garbled in Notepad++, I made sure to view it with UTF-8 encoding. If I import the garbled export again, the database will show the data as garbled once more and on the website the data now also shows as garbled. In this case, an actually corrupted export happened.

How can I solve this encoding problem? Clearly PHP can handle UTF-8 properly, my Apache web server is also serving UTF-8 and my database is configured seemingly correctly as well but there is an issue with PHPMyAdmin or the database/database table collation.

BullyWiiPlaza
  • 17,329
  • 10
  • 113
  • 185
  • Try using CLI. If the text is garbled in CLI then it's nothing to do with phpMyAdmin. – Dharman Jul 07 '22 at 10:09
  • I find it more likely that your PHP code is broken and only accidently shows the correct data. If I were you, I would fix PHP code. Make sure you set the correct connection charset in mysqli. – Dharman Jul 07 '22 at 10:11
  • @Dharman: I added the PHP code for opening the connection. How would you adapt it? I see what you mean. I use the same PHP code to insert data into the database. Maybe forcing the right charset in PHP instead of leaving it unspecified is the solution. I added the used default charset to the question as well. – BullyWiiPlaza Jul 07 '22 at 12:33
  • 1
    You need to stop manually checking for errors. Please read: [Should we ever check for mysqli_connect() errors manually?](https://stackoverflow.com/q/58808332/1839439) and [Should I manually check for errors when calling “mysqli_stmt_prepare”?](https://stackoverflow.com/q/62216426/1839439) – Dharman Jul 07 '22 at 13:09
  • It looks like you have the correct charset already. Something in your process must be garbling up the data. Maybe you have `utf8_encode` somewhere or some other useless function. It's hard to say. You need to try and narrow it down – Dharman Jul 07 '22 at 13:10
  • Please [edit] your question to extend your [mcve]. Share some examples of expected and garbled text. – JosefZ Jul 07 '22 at 13:42

2 Answers2

1

It looks like the issue was entirely elsewhere since I'm supplying data to PHP with C++ code. The C++ code uses the nlohmann JSON libary to build the data submitted to the PHP script. The issue was my inability to specifically encode std::strings to UTF-8 like described here when putting data into a C++ JSON object. With that said, everything is now working as expected.

BullyWiiPlaza
  • 17,329
  • 10
  • 113
  • 185
0
⚈  If using mysqli, do $mysqli_obj->set_charset('utf8mb4');
⚈  If using PDO do somethin like $db = new PDO('dblib:host=host;dbname=db;charset=utf8mb4', $user, $pwd);
⚈  Alternatively, execute SET NAMES utf8mb4

Any of these will say that the bytes in the client are UTF-8 encoded. Conversion, if necessary, will occur between the client and the database if the column definition is something other than utf8mb4.

More notes on PHP: http://mysql.rjweb.org/doc.php/charcoll#php

If you have specific garbling, see Trouble with UTF-8 characters; what I see is not what I stored

If you suspect the data being fed from PHP to Notepad, dump a few Chinese characters in hex and shown to us. I would expect every 4th character to be hex F0 or every 3rd to be between E3 and EA. (These are the first byte for 4-char and 3-char UTF-8 encoding of Chinese characters.)

Does Notepad properly handle UTF-8, or does it need a setting?

If you are in the "cmd" in Windows, you may need chcp 65001; see http://mysql.rjweb.org/doc.php/charcoll#entering_accents_in_cmd That way, more non-English characters will display correctly.

Rick James
  • 135,179
  • 13
  • 127
  • 222