0

I am confused! Recently my webhotel updated php and now my old tables render special characters differently (wrongly). Both my tables and my input/output-php-pages are set to utf-8 and since this update, also the inputs from php are treated differently; now my special characters are being utf-8-encoded as they enter the database. So since this change, when I review tables within phpMyAdmin, the old inserts have the original (non-encoded) special characters - the new posts have utf-8-encoded charcters (also special).

So what I would like to do is rewrite input and output to insert and show non-encoded characters - but I am not sure if this is possible without skipping utf-8 entirely (in php and mySQL). But is there an utf-8- way to submit non-encoded characters?

AND - perhaps more fundamentally - I need to understand what the possible downsides are. I am using Danish characters in and out and I'm not going to use any other language (for this project). So if it IS possible to insert and output non-encoded characters using utf-8 - am I then going to have unexpected/destructive issues?

I have read a lot of posts regarding php/mySQL/special characters but I haven't seen this angle on the issue yet. Hope I am not duplicating I hope not because it has been working very nicely until the update.

morganF
  • 117
  • 9
  • If you have a DB for testning, I would try [mb_convert_encoding](http://php.net/manual/en/function.mb-convert-encoding.php). I would recommend to only try this in the test-DB before you know it works.. – SebHallin Mar 06 '15 at 16:48
  • I don't have a testing db - but possbily I might need one for this reason. Undecided yet. But thanks – morganF Mar 06 '15 at 19:19

1 Answers1

3

Even if you are using only Danish characters, you may as well go utf8 all the way.

There are many places where the encoding needs to be stated:

  • The at the top of the html
  • The columns in the database (column CHARACTER SET defaults from table, which defaults from database)
  • The encoding in your PHP code.

When you CREATE TABLE, tack on DEFAULT CHARACTER SET utf8. If you have existing tables, without that, speak up; we may need to deal with them. If you want Danish collation, the specify COLLATION utf8_danish_ci, too. Then (if I recall correctly), aa will sort after z. (The default is utf8_general_ci, which won't do that sorting.) Figure out what encoding you have (or can get) in your php code. If you have some text with accents in it, do this:

$hex = unpack('H*', $text);
echo implode('', $hex)

If you have utf8, å will be C3A5, for latin1 it will be E5.

Regardless of what encoding in in the tables, you must call set_charset('utf8') or set_charset('latin1') depending on what encoding is in the data in PHP. MySQL will gladly transcode between latin1 and utf8 as things are passed between PHP and MySQL. For different APIs:

⚈  mysql: mysql_set_charset('utf8');
⚈  mysqli: $mysqli_obj->set_charset('utf8');
⚈  PDO: $db = new PDO('dblib:host=host;dbname=db;charset=UTF-8', $user, $pwd);

For much more info, see http://mysql.rjweb.org/doc.php/charcoll .

Rick James
  • 135,179
  • 13
  • 127
  • 222
  • With `utf8_danish_ci`, these sort after `z`, in the clumps shown: `Ä=Æ=ä=æ Ö=Ø=ö=ø Aa=Å=å Þ=þ` – Rick James Mar 06 '15 at 18:32
  • Well, what I am really asking is; is there a way to store the actual special characters in db using utf-8. What I am getting now is "æ" instead of æ, "Ø" instead of "Ø", etc. This seems stupid to me; I am getting different special characters inserted into db **when I would rather have "my own" special characters inserted**. As I see it you are guiding me to work with (and accept) the utf-encoded char's but I would only like to do that IF I AM CONVINCED THAT IT SERVES A FEASIBLE/REASONABLE PURPOSE - OR IF IT IS UNAVOIDABLE? – morganF Mar 06 '15 at 23:38
  • (It is a common problem, and it is fixable.) The utf8 encoding for `Ø` is hex `C398`. But when that hex is interpreted as latin1, it comes out `Ø`. So, the problem is that PHP had bytes in one encoding, but the transmission to/from MySQL was assuming a different encoding. That inconsistency led to an error either on INSERTion or on SELECTing. Do `SELECT HEX(col) ...` to see what is in the table. Then we can pursue where the 'bug' lies. My blog covers the problem and too much more: http://mysql.rjweb.org/doc.php/charcoll . I have provided tidbits of it in this thread. – Rick James Mar 07 '15 at 00:00
  • Sorry - haven't had much time to read up on your blog. But I intend to. So far, many thanks for elaborate answer. But perhaps you could answer me this: when I look into my database, via mySQL, and I see my desired special characters displaying correctly (those inserted before update) - are they actually formatted correctly (c398, etc.) and then presented the right way beacuse the phpMyAdmin-page is build the _right way_ - and my own front end is flawed? OR! Am I supposed to see the utf-8-encoding when I check phpMyAdmin? – morganF Mar 08 '15 at 14:13
  • Right. I read up on the blog, but too many unknowns for me. When I investigated further on "SET NAMES" I saw several warnings about that and stumbled upon recommendations ([stackoverflow-1650591](http://stackoverflow.com/questions/1650591/whether-to-use-set-names/1650834#1650649)) to use `mysql_set_charset('utf8', $link)`. **When I tried that something happened that seems to have affected ALL MY PHP-pages** so the old inserts show correctly, and only the few recent look wrong (the false _latino_-special characters). Not sure if the problem is solved for good - but it's working at the moment... – morganF Mar 08 '15 at 21:37
  • When I read thru your answer again, I realised that you suggested exactly the same -. only I didn't know what it meant when I read it the first time. Now it jumps in my eye: ⚈ mysql: `mysql_set_charset('utf8');` Thanks a bunch – morganF Mar 09 '15 at 23:41
  • See also `bin2hex()`. – Rick James Dec 21 '21 at 17:17