1

I've seen several questions asked on this topic but here's the short of the long: I have a db created in phpmyadmin that's using utf8_general_ci collation and rows in some tables can have either English or Japanese text. I am displaying out these rows using a php mysqli query, but am getting gibberish out from it.

The legwork I've done includes the following:

  • Adding in a meta tag for the char set for the HTML header
    meta http-equiv="Content-Type" content="text/html; charset=UTF8"

  • Explicitly telling mysqli connection to use UTF-8 via set_charset() and also sanity checking it with character_set_name()

  • Verifying the collation of the database/pertinent tables/columns

  • Adding in some Japanese text extraneous of any mysql queries to verify the browser can render it

  • Making sure the php file(s) are UTF-8 encoded

  • Verifying the Japanese input is being taken properly to the database and in the php

  • Sanity check the database with a comparable test database using utf8_general_ci collation (in case the problem db/table/column was originally made with a different collation and changing it caused existing rows to be garbled)

  • Verify the database collation and charset in the information_schema table, such that charset is utf8 and the collation is utf8_general_ci

  • In the mysqli query attempt to convert the pertinent column to utf8 (this likes to throw an error, so it seems invalid)

All of the above checks out and I still get gibberish, hence I ask the question of if it's possible that the rows returned in a query can not match up with the collation of the db/table/column causing it to be rendered incorrectly.

For completeness, this is being done in a wampserver3 deployment with php 5.6.25, mysql 5.7.14, and phpmyadmin 4.6.4. The column that can take English or Japanese is set to TEXT type.

I'm completely stumped with this.

SimpleBeat
  • 747
  • 1
  • 10
  • 20
  • 1
    Are you sure the font is supporting japanese characters? Also, could you show an image of how the "gibberish" looks? – junkfoodjunkie May 10 '17 at 22:45
  • 1
    You may find it helpful to use `HEX(column)` in SQL queries to determine whether the column contains the characters you think it contains. It might make sense for you to [edit] your question to show some of that stuff. Also, may I be pendantic? This is a question about character sets, not collations. – O. Jones May 11 '17 at 00:04
  • There are 5 different types of gibberish; which do you have? See [_this_](http://stackoverflow.com/questions/38363566/trouble-with-utf8-characters-what-i-see-is-not-what-i-stored) for diagnosing your problem. – Rick James May 11 '17 at 04:16
  • When you get the `HEX`, you should see character as a group of 3 bytes: `E381yy` for Hiragana, `E383yy` for Katakana, or perhaps `EAB0yy`-`ED9Eyy`. Kanji would be other `Exyyzz`. – Rick James May 11 '17 at 04:22

1 Answers1

0

First, thanks for the responses in general and second, sorry for my lack of response to everyone's questions in trying to help out. Put the original question up later in the day and just went and slept on it. I ended up finding out was going on.

So as it turns out, mysql's UTF8 collation (utf8_general_ci, unicode_ci, etc.) aren't truly supporting the full number of possible character codes - it's only about 6% of them per this article (and also how to change the underlying mysql client/server and mysqld configuration via a my.cnf if using a X/LAMP. wampserver, like in my case, would require modifying the my.ini for mysql in a similar manner). The short of the long: mysql collation and php need to use utf8mb4 charset, not utf8.

On the php side, rather than using a header meta element, I just tell the mysqli connection to use utf8mb4 charset ($somemysqliconn->set_charset("utf8mb4")). Alternatively, you could also use header('Content-Type: text/html; charset=utf8mb4').