7

Character encoding has always been a problem for me. I don't really get when the correct time to use it is.

All the databases I use now I set up with utf8_general_ci, as that seems to a good 'general' start. I have since learned in the past five minutes that it is case insensitive. So that's helpful.

But my question is when to use utf8_encode and utf8_decode ? As far as I can see now, If I $_POST a form from a table on my website, I need to utf8_encode() the value before I insert it into the database.

Then when I pull it out, I need to utf8_decode it. Is that the case? Or am I missing something?

Chud37
  • 4,907
  • 13
  • 64
  • 116
  • I further recommend *[What Every Programmer Absolutely, Positively Needs To Know About Encodings And Character Sets To Work With Text](http://kunststube.net/encoding/)*. – deceze Sep 17 '15 at 14:04
  • Make your site UTF-8 only, then you never have to worry about encoding and decoding again. The `utf8_general_ci` is only the collation and doesn't have any effect on how data is stored. I made a small [overview](http://www.martinstoeckli.ch/php/php.html#utf8) of the necessary steps. – martinstoeckli Sep 17 '15 at 14:16
  • @martinstoeckli To be terribly pedantic... if the collation is `utf8_blablabla`, that implies that the column encoding must be `utf8`... correlation != causation, but they're strongly linked in this case... ;) – deceze Sep 17 '15 at 14:24
  • 1
    @deceze - Ok didn't know that. To make it more clear, the _charset_ defines how data is stored, and the _collation_ is how queries make comparisons e.g. to sort the rows. Didn't know that using a collation of utf8_* is restricted to a charset of UTF8 too, I meant to remember a charset of iso-8859-1 with a collation of utf8_general_ci, but maybe i'm wrong there. – martinstoeckli Sep 17 '15 at 14:34
  • @martinstoeckli I *think* that the set of valid collations is dictated by the charset. Perhaps *some* can be mixed if the charset is a subset of the collation, I'm not sure about that. – deceze Sep 17 '15 at 14:36

2 Answers2

9

utf8_encode and _decode are pretty bad misnomers. The only thing these functions do is convert between UTF-8 and ISO-8859-1 encodings. They do exactly the same thing as iconv('ISO-8859-1', 'UTF-8', $str) and iconv('UTF-8', 'ISO-8859-1', $str) respectively. There's no other magic going on which would necessitate their use.

If you receive a UTF-8 encoded string from the browser and you want to insert it as UTF-8 into the database using a database connection with the utf8 charset set, there is absolutely no use for either function anywhere in this chain. You are not interested in converting encodings at all here, and that should be the goal.

The only time you could use either function is if you need to convert from UTF-8 to ISO-8859-1 or vice versa at any point, because external data is encoded in this encoding or an external system expects data in this encoding. But even then, I'd prefer the explicit use of iconv or mb_convert_encoding, since it makes it more obvious and explicit what is going on. And in this day and age, UTF-8 should be the default go-to encoding you use throughout, so there should be very little need for such conversion.

See:

Dharman
  • 30,962
  • 25
  • 85
  • 135
deceze
  • 510,633
  • 85
  • 743
  • 889
-3

Basically utf8_encode is used for Encodes an ISO-8859-1 string to UTF-8. When you are working on translation like One language to Another language than you have to use this function to prevent to show some garbage Characters.

Like When you display spanish character than some time script doesn't recognize spanish character and it will display some garbage character instead of spanish character.

At that time you can use.

For more refer about this please follow this link :

http://php.net/manual/en/function.utf8-encode.php

Hardik Patel
  • 706
  • 5
  • 14