0

When I moved from php mysql shared hosting to my own VPS I've found that code which outputs user names in UTF8 from mysql database outputs ?�??????� instead of 鬼神❗. My page has utf-8 encoding, and I have default_charset = "UTF-8" in php.ini, and header('Content-Type: text/html; charset=utf-8'); in my php file, as well as <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> in html part of it.

My database has collation utf8_bin, and table has the same. On both previos and current hosting in phpmyadmin for this database record I see: 鬼神❗. When I create ANSI text file in Notepad++, paste 鬼神❗ into it and select Encoding->Encode in UTF-8 menu I see 鬼神❗, so I suppose it is correct encoded UTF-8 string.

Ok, and then I added

init_connect='SET collation_connection = utf8_general_bin'
init_connect='SET NAMES utf8'
character-set-server=utf8
collation-server=utf8_general_bin
skip-character-set-client-handshake 

to my.cnf and now my page shows 鬼神❗ instead of ?�??????�. This is the same output I get in phpmyadmin on both hostings, so I'm on a right way. And still somehow on my old hosting the same php script returns utf-8 web page with name 鬼神❗ while on new hosting - 鬼神❗. It looks like the string is twice utf-8 encoded: I get utf-8 string, I give it as ansi string to Notepad++ and it encodes it in correct utf-8 string.

However when I try utf8_encode() I get й¬ÑзÒÑвÑâ, and utf8_decode() returns ?�???????. The same result return mb_convert_encoding($name,"UTF-8","ISO-8859-1"); and iconv( "ISO-8859-1","UTF-8", $name);.

So how could I reproduce the same conversion Notepad++ does? See answer below.

Tertium
  • 6,049
  • 3
  • 30
  • 51
  • 1
    What MySQL calls UTF-8 is not UTF-8, but utf8mb4 is. See for example https://mathiasbynens.be/notes/mysql-utf8mb4 – Micha Wiedenmann Oct 22 '18 at 15:56
  • @MichaWiedenmann Currently it is not that important, and moreover - it somehow DOES WORK on my old hosting, even though it is not utf-8 I see JApanese symbols. – Tertium Oct 22 '18 at 16:11
  • Does the PHP code that stablish the database connection set the encoding explicitly, or it just relies on server defaults? What's the charset/collation of the database tables themselves? – Álvaro González Oct 22 '18 at 16:51
  • It's possible something became corrupted when you migrated your data. If you do `SELECT column, HEX(column)` on the same rows both the shared and vps databases you'll be able to see whether this happpened. – O. Jones Oct 22 '18 at 17:30
  • See "question mark" and "black diamond" and "Mojibake" in https://stackoverflow.com/questions/38363566/trouble-with-utf8-characters-what-i-see-is-not-what-i-stored – Rick James Oct 25 '18 at 03:26
  • Keep in mind that the `root` user does not execute `init_connect`. – Rick James Oct 25 '18 at 03:30

1 Answers1

0

The solution was simple yet not obvious for me, as I never saw my.cnf on that shared hosting: it seems that that server had settings as follows

init_connect='SET collation_connection = cp1252'
init_connect='SET NAMES cp1252'
character-set-server=cp1252

So to make no harm to other code on my new server I have to place mysql_query("SET NAMES CP1252"); on top of each php script which works with utf8 strings. The trick here was script gets a string as is (ansi) and outputs it, and when browser is warned that page is in utf-8 encoding it just renders my strings as utf-8.

Tertium
  • 6,049
  • 3
  • 30
  • 51