9

when you have a charset different of UTF-8 and you need to put it on JSON format to migrate it to a DB, there are two methods that can be used in PHP, calling utf8_encode() and iconv(). I would like to know which one have better performance, and when is convenient to use one or another.

kenorb
  • 155,785
  • 88
  • 678
  • 743
Pedro Teran
  • 1,200
  • 3
  • 17
  • 43

2 Answers2

15

when you have a charset different of UTF-8

Nope - utf8_encode() is suitable only for converting a ISO-8859-1 string to UTF-8. Iconv provides a vast number of source and target encodings.

Re performance, I have no idea how utf8_encode() works internally and what libraries it uses, but my prediction is there won't be much of a difference - at least not on "normal" amounts of data in the bytes or kilobytes. If in doubt, do a benchmark.

I tend to use iconv() because it's clearer that there is a conversion from character set A to character set B.

Also, iconv() provides more detailed control on what to do when it encounters invalid data. Adding //IGNORE to the target character set will cause it to silently drop invalid characters. This may be helpful in certain situations.

Pekka
  • 442,112
  • 142
  • 972
  • 1,088
  • also you can use //trasnlit to generate this type of conversion u\00f, that java decode automatically to ISO-8859-1. but I'm not sure if the same is going to happen if there are different charsets that got encoded to utf-8. – Pedro Teran Feb 29 '12 at 13:05
  • Note that PHP >= 5.4.0 will now fail on invalid characters, even with the `//IGNORE` flag: https://bugs.php.net/bug.php?id=61484 – dotancohen May 07 '13 at 07:46
0

I recommend you to write your own function. It will be 2-3 lines long and it will be better than struggling with locale, iconv etc. issues.

For example: Fix Turkish Charset Issue Html / PHP (iconv?)

Community
  • 1
  • 1
trante
  • 33,518
  • 47
  • 192
  • 272
  • And what if the author wants an approach where he doesn't know all the possible input characters? – AM- Apr 16 '13 at 18:04