0

After starting with latin1, then utf8 I now have set the encoding using "ALTER TABLE [table] CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;"

If I use SHOW FULL COLUMNS FROM [table]; I can see that the collation is indeed set to utf8mb4_unicode_ci and if I use mb_detect_encoding() on my string in PHP I can see it is in UTF-8 to start.

In PHP I echo out the word with ó and it appears correctly, then I insert and can see in the database that it has been changed. On the PHP side I have tried setting the charset to utf8 with mysql_set_charset('utf8mb4',$cn);, mysql_query("SET NAMES utf8mb4");, and various encoding changes.

If I manually specify the string to be inserted in Notepad++, whether it's ó or Σ it is inserted without issue. The problem arises when the string is gathered from a meaningful source. The encoding set in my Notepad++ is UTF8.

The last thing I tried was creating a new database in UTF8 thinking maybe having the table in UTF8 was not enough but the issue is present there too.

I've tried everything I could find out there on this topic but nothing seems to have worked. Any and all help would be appreciated.

user3653863
  • 227
  • 3
  • 13
  • Have you set the browser to use utf-8 also? `` – RiggsFolly Jan 29 '18 at 17:28
  • If I use the meta charset tag at the top of my page the browser displays ó instead of ó – user3653863 Jan 29 '18 at 17:37
  • @RiggsFolly If you mean the .php file itself then yes – user3653863 Jan 29 '18 at 17:39
  • 1
    **WARNING**: If you're just learning PHP, please, do not learn the obsolete [`mysql_query`](http://php.net/manual/en/function.mysql-query.php) interface. It's awful and has been removed in PHP 7. A replacement like [PDO is not hard to learn](http://net.tutsplus.com/tutorials/php/why-you-should-be-using-phps-pdo-for-database-access/) and a guide like [PHP The Right Way](http://www.phptherightway.com/) helps explain best practices. Make **sure** your user parameters are [properly escaped](http://bobby-tables.com/php) or you will end up with severe [SQL injection bugs](http://bobby-tables.com/). – tadman Jan 29 '18 at 17:49
  • @tadman I'm afraid I have been tasked on working on an older, rather large site so converting everything to PDO is not feasible. I am using PDO on a newer site we have though and quite like it. All queries go through a cleaning function rather than being run directly so I believe we should be safe on that front – user3653863 Jan 29 '18 at 17:51
  • Hopefully you can roll this code over sooner than later as in PHP 7 all those functions were permanently removed from PHP. Most "cleaning functions" are completely inadequate and still expose you to serious risk. Tools like [SQLMap](http://sqlmap.org) might show you how exposed you are if you're curious. It's very good at finding a tiny hole and cranking it wide open. – tadman Jan 29 '18 at 18:19
  • I don't think, it's what @RiggsFolly meant. If you're gathering the data from submitting a form, and you don't supply a charset for the web page in HTML, some browsers doesn't support utf8 characters in my experience. – Cemal Jan 29 '18 at 18:26
  • @Cemal The source comes from a database, and that table is latin, is the mb_detect_encoding() function not accurate when it reports UTF-8 then? On the processing page I had set at the top at Riggs's suggestion but found no difference – user3653863 Jan 29 '18 at 18:38
  • Have you checked [mysql charsets and collations](https://dev.mysql.com/doc/refman/5.7/en/charset-connection.html) – Cemal Jan 29 '18 at 18:43
  • Quoting from it : For this, the server uses the character_set_connection and collation_connection system variables. It converts statements sent by the client from character_set_client to character_set_connection, xexcept for string literals that have an introducer (for example, _utf8mb4 or _latin2). collation_connection is important for comparisons of literal strings. For comparisons of strings with column values, collation_connection does not matter because columns have their own collation, which has a higher collation precedence. – Cemal Jan 29 '18 at 18:45
  • The collation of the columns has been set to utf8 so it should take precedence then, yes? When I ran the alter table convert to character set command up above it said it had affected the number of rows I had in the table. Since my last update though I found this https://github.com/neitanod/forceutf8 class through another post. Inserting as-is, encoding to UTF8 and then inserting both had the expected issue, but converting to latin1 before inserting caused the character to be inserted properly. It works, but the only way that makes sense to me is if the alteration didn't properly take place. – user3653863 Jan 29 '18 at 19:13
  • That's called "Mojibake". See [_this_](https://stackoverflow.com/questions/38363566/trouble-with-utf8-characters-what-i-see-is-not-what-i-stored) for discussion of what causes it and what to do about it. – Rick James Jan 29 '18 at 22:13

0 Answers0