0

I have a JSON string that contains Dal\u00e9. When I use json_decode on the JSON, it is converted to Dalé, however the original string that the JSON is from is Dalé. Why is this not converted properly?

I have found that "\u00E9" is the C/C++/Java source code encoding for é. However, to me this doesn't answer why this is going wrong.


Example of incorrect PHP output:

<?php
$opts = array('http'=>array('ignore_errors' => true));
$context = stream_context_create($opts);
$jsonurl = "http://api.kivaws.org/v1/loans/552804.json";
$json = file_get_contents($jsonurl, false, $context);
$json_output = array(json_decode($json));
$json_error = $json_output[0]->error;
$json_message = $json_error->message;

foreach ($json_output[0]->{'loans'} as $loan) {
echo 'Name: '.$loan->{'name'};
}
?>
michaellindahl
  • 2,012
  • 5
  • 36
  • 64
  • Because [PHP hates Unicode](http://stackoverflow.com/q/571694/20862). – Ignacio Vazquez-Abrams Jun 24 '13 at 04:56
  • 3
    php handles unicode escape sequences in json perfectly fine. Likely, you have borked the string somewhere else, or you haven't told the web browser that you're giving it utf8. – goat Jun 24 '13 at 04:58
  • http://ideone.com/0BRsYA - it works fine – zerkms Jun 24 '13 at 05:00
  • 1
    @Ignacio Vazquez-Abrams: you're overdramatizing – zerkms Jun 24 '13 at 05:01
  • Perhaps. But that doesn't necessarily mean that I'm wrong. – Ignacio Vazquez-Abrams Jun 24 '13 at 05:01
  • @Ignacio Vazquez-Abrams: well, sort of. "because" - is wrong. OP doesn't get correct result not because of php's issue, but his one (presumably with browser charset setting) – zerkms Jun 24 '13 at 05:02
  • @zerkms it works fine with two \\ but notice that there is only one \ in the source JSON that I am retrieving. – michaellindahl Jun 24 '13 at 05:03
  • @michaellindahl: two backslashes is a part of php string escaping syntax. If you `echo` it you would see there is only one backslash. (http://ideone.com/uNYDDq) – zerkms Jun 24 '13 at 05:04
  • It would be convenient if all this character set nonsense could be handled internally. Unfortunately it's left to the coder to get wrong. – Ignacio Vazquez-Abrams Jun 24 '13 at 05:04
  • @Ignacio Vazquez-Abrams: it's a representation issue, not handling. You could blame browsers they support multiple charsets then. – zerkms Jun 24 '13 at 05:05
  • @zerkms Can you get it to correctly parse this source: http://api.kivaws.org/v1/loans/552804.json – michaellindahl Jun 24 '13 at 05:05
  • @michaellindahl: `$f = file_get_contents('http://api.kivaws.org/v1/loans/552804.json'); var_dump(json_decode($f));` --- it works fine. Make sure your browser's encoding is UTF-8, not something else. – zerkms Jun 24 '13 at 05:07
  • @chris I can't change the source: http://api.kivaws.org/v1/loans/552804.json I'm using the following: `$json = file_get_contents($jsonurl, false, $context); $json_output = array(json_decode($json));` – michaellindahl Jun 24 '13 at 05:07
  • @zerkms it doesn't work fine for me. My browser is set to UTF-8, and I don't have access to everyone's browser anyways. Is there a way to correct the example code I just posted? – michaellindahl Jun 24 '13 at 05:13
  • Is the server sending the page with the correct charset (utf8)? – Marcelo Pascual Jun 24 '13 at 05:16
  • @michaellindahl The browser, or your document? – Daedalus Jun 24 '13 at 05:16
  • @MarceloPascual No, apparently it wasn't – michaellindahl Jun 24 '13 at 05:24
  • @Daedalus I thought I was asked to check the browser's encoding, which was UTF-8, however the document needed to have it's charset set through the PHP header (see accepted answer) – michaellindahl Jun 24 '13 at 05:25

2 Answers2

3

You need to tell the web browser what encoding you are giving it.

<?php
header('content-type: text/plain; charset=utf-8');
var_dump(json_decode($jsonStr));
goat
  • 31,486
  • 7
  • 73
  • 96
  • Note: for my case (having this inside a webpage) I will be using `header('content-type: text/html; charset=utf-8');` – michaellindahl Jun 24 '13 at 05:19
  • @michaellindahl: your comments are confusing ;-) – zerkms Jun 24 '13 at 05:20
  • @zerkms they make complete sense to me :P apparently I wasn't setting the charset and relying on the default which was wrong. I didn't understand this needed to be set in the PHP header, which I've never had to do before. – michaellindahl Jun 24 '13 at 05:23
  • Any reason why this needs to be done? Should I be setting the header in the PHP for every webpage or just in certain circumstances? – michaellindahl Jun 24 '13 at 05:26
  • @michaellindahl: yep, it's a good idea to always set it explicitly, because sometimes browsers aren't good in guessing – zerkms Jun 24 '13 at 05:52
  • 2
    @michaellindahl otherwise, a browser has to guess. And, there's many situations where it's not possible to guarantee a correct guess, which leads to erratic and baffling bugs throughout your software. A robust piece of web software always specifies the encoding, for every page, every time. – goat Jun 24 '13 at 15:10
1

if you are using php 5.4 you may use the function options of json_encode() like this :-

echo $b=json_encode('Dalé',JSON_UNESCAPED_UNICODE);
echo json_decode($b);
Rajeev Ranjan
  • 4,152
  • 3
  • 28
  • 41