2

I'm trying to pull sport players from my database that are already stored as unicode values. when calling json_encode it gives up when it hits unicode characters in the format i've got:

$values = array('a'=>'BERDYCH, Tomáš','b'=>'FEDERER, Roger');
echo json_encode($values);

the result is

{"a":"BERDYCH, Tom","b":"FEDERER, Roger"}

You can see 'Tom' was cut-off because it reached the unicode characters.

I understand json_encode only handles \uxxxx style characters but the problem is my database of thousands of sporting competitors already contains unicode stored values, so somehow I need to convert á type characters into \uxxxx without doing updates to my data source.

Any ideas?

Luc
  • 985
  • 7
  • 10
  • 1
    `json_encode` handles Unicode just fine, at least it's supposed to. Are you sure your text is UTF-8 encoded? Show us a `bin2hex` of the text. – deceze Jan 23 '12 at 23:24
  • echo bin2hex('BERDYCH, Tomáš'); gives me 424552445943482c20546f6de19a – Luc Jan 23 '12 at 23:31
  • maybe i've got to set an environment variable or something? does the above code work for you and give you back a json encoded version of Tomas Berdych's full name? – Luc Jan 23 '12 at 23:32

4 Answers4

1

json_encode() does this when it gets characters that are not valid UTF-8 characters.

If you are fetching data from the database, the most likely reason is that your connection is not UTF-8 encoded, and you are getting ISO-8859-1 data from your queries.

Show your database code for a suggestion how to change this.

I understand json_encode only handles \uxxxx style characters

This is not true. json_encode() outputs Unicode characters encoded this way, but it doesn't expect them in the incoming data.

Pekka
  • 442,112
  • 142
  • 972
  • 1,088
  • thanks Pekka but the two lines of code above causes the same issue, even without using a database? – Luc Jan 23 '12 at 23:45
  • @Luc is your source file UTF-8 encoded? (Usually in the "Save As" dialog of your IDE) – Pekka Jan 23 '12 at 23:47
  • Ahh, i see Pekka - thanks for that! 1) i've saved my PHP as UFT-8 encoded 2) i'm investing changing my columns, database tables and sql default settings to encode default values as utf-8 – Luc Jan 23 '12 at 23:58
1

Your source code and/or the data coming from the database is not encoded in UTF-8. I'd guess it's one of the specialized ISO-8859 encodings, but I'm not sure. When saving your source code, make sure it's saved in UTF-8. When getting data from the database, make sure you're setting the connection to utf8.

See What Every Programmer Absolutely, Positively Needs To Know About Encodings And Character Sets To Work With Text and Handling Unicode Front To Back In A Web App.

deceze
  • 510,633
  • 85
  • 743
  • 889
  • Thanks deceze, some very helpful links there. Any new projects from now on the first thing i'll be doing is setting up utf-8 from the beginning. – Luc Jan 24 '12 at 00:00
0

To make sure they are UTF8, encode all values in your array

$values = array_map('utf8_encode', $values);

If that doesn't help use mb_detect_encoding() and mb_convert_encoding() to change language specific encoding to UTF8.

opengrid
  • 1,942
  • 4
  • 16
  • 25
  • 2
    Please do not suggest `utf8_encode` without clarifying that the source needs to be in Latin1. Otherwise `utf8_encode` is useless! And "trying" or "guessing" is not the solution when it comes to encodings. *Know* what encoding you're dealing with! – deceze Jan 23 '12 at 23:41
  • There's more likely a better solution: Fixing the problem at its root. (Edit: deceze beat me to the punch!) – Pekka Jan 23 '12 at 23:41
0

It's a c# question, but take a look at Converting Unicode strings to escaped ascii string for an implementation that does this.

Community
  • 1
  • 1
Kevin Hakanson
  • 41,386
  • 23
  • 126
  • 155