1

This has got me completely stumped:

print_r($json);
echo json_encode($json);

output:

Array
(
    [query] => dia
    [suggestions] => Array
        (
            [0] => Diana Johnson
            [1] => Diane Abbott
        )

)
{"query":"dia","suggestions":[null,null]}

What on earth is going wrong?

edit Just to add to the general wtf-ery of this, here's another sample:

Array
(
    [query] => david
    [suggestions] => Array
        (
            [0] => David Cameron
            [1] => David Amess
            [2] => David Anderson
            [3] => David Blunkett
            [4] => David Burrowes
        )

)
{"query":"david","suggestions":["David Cameron",null,null,null,null]}
fredley
  • 32,953
  • 42
  • 145
  • 236

2 Answers2

3

I'm posting this as an answer because I need the full formatting abilities of the normal answer box.

Yeah, it's UTF-8 all right. From the PHP interactive prompt:

php > $david = urldecode('David%A0Amess');
php > echo json_encode($david);
null
php > $david = urldecode('David%20Amess');
php > echo json_encode($david);
"David Amess"
php > $david = urldecode('David%c2%a0Amess');
php > echo json_encode($david);
"David\u00a0Amess"

So, we can assume that you're dealing with either ISO-8859 or Windows-1252, given that we're dealing with a broken NBSP. We can fix this with iconv:

php > $david = urldecode('David%A0Amess');
php > $david_converted = iconv('Windows-1252', 'UTF-8', $david);
php > echo json_encode($david_converted);
"David\u00a0Amess"

So, this means that you are going to need to not trust what you're pulling out of MySQL, assuming you've done the SET NAMES thing. Clearly something has gone awry when you were inserting data. You probably weren't giving MySQL well-formed UTF-8, and it stupidly did not complain. (If you were using other, smarter, more correct databases, and tried to insert the unencoded NBSP, they would have rejected the input.)

Charles
  • 50,943
  • 13
  • 104
  • 142
2

This looks like an autocomplete script. I assume your results are loaded from a database, are you sure they're utf-8? If you cannot replicate this functionality by hardcoding the array, then it's probably an encoding issue.

According to http://php.net/manual/en/function.json-encode.php, "This function only works with UTF-8 encoded data."

You can also use http://php.net/manual/en/function.json-last-error.php to see the last error.

Peeter
  • 9,282
  • 5
  • 36
  • 53
  • I'm only on 5.2 so no `json_last_error`. If the data isn't in UTF-8 (the collation on the table it's in is UTF-8) then how do I go about converting it? – fredley Mar 22 '11 at 23:03
  • mysql_query('SET CHARACTER SET utf8') – Peeter Mar 22 '11 at 23:04
  • We can tell pretty quickly if it's UTF8 or not by inspecting the bytes. What's the output of `echo rawurlencode($json['suggestions'][0]);`? – Charles Mar 22 '11 at 23:05
  • `David%20Cameron`, but the next is `David%A0Amess`. Looks like that is the problem, so what can I do? I just exported and re-imported the table via a utf-8 text document, that didn't help. – fredley Mar 22 '11 at 23:06
  • 1
    `0xA0` is not a valid UTF-8 sequence -- it's a Windows-1252 non-breaking space. In UTF-8, that character is represented by `0xC2` `0xA0`. However, even if it was choking over that mistake, I'd only expect David Amess to be missing from the output. David Cameron looks just fine... – Charles Mar 22 '11 at 23:10
  • @Peeter The query executed successfully but there's no change in what I'm getting back. Do I have to run it every time I want to make a query? – fredley Mar 22 '11 at 23:12
  • @Charles I've got all the data in a Notepad++ document. The 'Convert to UTF-8' command doesn't seem to help, is there some other way to convert the data? – fredley Mar 22 '11 at 23:14
  • I've posted my thoughts in a new answer, I needed formatting options that the comments box won't work with. – Charles Mar 22 '11 at 23:24