4

this issue seems specific to microsofttranslator.com so please ... any answers, if you can test against it ...

Using the following URL for translation: http://api.microsofttranslator.com/V2/Ajax.svc/TranslateArray .. I send via cURL some fantastic arguments, and get back the following result:

 [
      {
           "From":"en",
           "OriginalTextSentenceLengths":[13],
           "TranslatedText":"我是最好的",
           "TranslatedTextSentenceLengths":[5]
      },
      {
           "From":"en",
           "OriginalTextSentenceLengths":[16],
           "TranslatedText":"你是最好的",
           "TranslatedTextSentenceLengths":[5]
      }
 ]

When I use json_decode($output, true); on the output from cURL, json_decode gives an error about the syntax not being appropriate in the returned JSON:

 json_last_error() == JSON_ERROR_SYNTAX

The headers being returned with the JSON:

Response Headers

 Cache-Control:no-cache
 Content-Length:244
 Content-Type:application/x-javascript; charset=utf-8
 Date:Sat, 06 Aug 2011 13:35:08 GMT
 Expires:-1
 Pragma:no-cache
 X-MS-Trans-Info:s=63644

Raw content:

 [{"From":"en","OriginalTextSentenceLengths":[13],"TranslatedText":"我是最好的","TranslatedTextSentenceLengths":[5]},{"From":"en","OriginalTextSentenceLengths":[16],"TranslatedText":"你是最好的","TranslatedTextSentenceLengths":[5]}]

cURL code:

    $texts = array("i am the best" => 0, "you are the best" => 0);
    $ch = curl_init(); 
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    $data = array(
        'appId' => $bing_appId,
        'from' => 'en',
        'to' => 'zh-CHS',
        'texts' => json_encode(array_keys($texts))
    );
    curl_setopt($ch, CURLOPT_URL, $bingArrayUrl . '?' . http_build_query($data)); 
    $output = curl_exec($ch); 
sdolgy
  • 6,963
  • 3
  • 41
  • 61
  • I'd say that it has something to do with the character encoding somewhere along the way. Could you post more of the cURL code that you are using? – loganfsmyth Aug 06 '11 at 14:57
  • added. i personally don't think it's the cURL code but hope to be proven wrong! – sdolgy Aug 06 '11 at 15:08
  • Atleast for me, given json will go through json_decode without any problems... How did you echoed your json? – Waltsu Aug 06 '11 at 15:26
  • And you are just taking $output and passing straight to json_decode? If you print md_detect_encoding($output), do you get 'UTF-8'? I'm the same as @Waltsu, I pasted that stuff into a file and decoded the file contents, and it worked fine. – loganfsmyth Aug 06 '11 at 15:35
  • encoding: UTF-8 when i run: `$output = curl_exec($ch); echo mb_detect_encoding($output);` – sdolgy Aug 06 '11 at 22:58
  • How are you looking at the raw JSON? Are you using var_dump? Can you post the full output from var_dump($output), from "string..." onwards, please? – Matt Gibson Aug 18 '11 at 08:43

2 Answers2

6

The API is returning a wrong byte order mark (BOM).
The string data itself is UTF-8 but is prepended with U+FEFF which is a UTF-16 BOM. Just strip out the first two bytes and json_decode.

...
$output = curl_exec($ch);
// Insert some sanity checks here... then,
$output = substr($output, 3);
...
$decoded = json_decode($output, true);

Here's the entirety of my test code.

$texts = array("i am the best" => 0, "you are the best" => 0);
$ch = curl_init(); 
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$data = array(
    'appId' => $bing_appId,
    'from' => 'en',
    'to' => 'zh-CHS',
    'texts' => json_encode(array_keys($texts))
    );
curl_setopt($ch, CURLOPT_URL, $bingArrayUrl . '?' . http_build_query($data)); 
$output = curl_exec($ch);
$output = substr($output, 3);
print_r(json_decode($output, true));

Which gives me

Array
(
    [0] => Array
        (
            [From] => en
            [OriginalTextSentenceLengths] => Array
                (
                    [0] => 13
                )

            [TranslatedText] => 我是最好的
            [TranslatedTextSentenceLengths] => Array
                (
                    [0] => 5
                )

        )

    [1] => Array
        (
            [From] => en
            [OriginalTextSentenceLengths] => Array
                (
                    [0] => 16
                )

            [TranslatedText] => 你是最好的
            [TranslatedTextSentenceLengths] => Array
                (
                    [0] => 5
                )

        )

)

Wikipedia entry on BOM

Cedric Han
  • 174
  • 3
1

There is nothing syntactically wrong with your JSON string. It is possible that the json is coming back with characters outside the UTF-8 byte range, but this would cause json_decode() to throw an exception indicating that.

Test Code:

ini_set("track_errors", 1);

$json = '
 [
      {
           "From":"en",
           "OriginalTextSentenceLengths":[13],
           "TranslatedText":"我是最好的",
           "TranslatedTextSentenceLengths":[5]
      },
      {
           "From":"en",
           "OriginalTextSentenceLengths":[16],
           "TranslatedText":"你是最好的",
           "TranslatedTextSentenceLengths":[5]
      }
 ]
';

$out = @json_decode($json, TRUE);

if(!$out) {
        throw new Exception("$php_errormsg\n");
} else {
        print_r($out);
}

?>

Output:

$ php -f jsontest.php 
Array
(
    [0] => Array
        (
            [From] => en
            [OriginalTextSentenceLengths] => Array
                (
                    [0] => 13
                )                                                                                                                                                                   

            [TranslatedText] => 我是最好的                                                                                                                                          
            [TranslatedTextSentenceLengths] => Array                                                                                                                                
                (                                                                                                                                                                   
                    [0] => 5                                                                                                                                                        
                )                                                                                                                                                                   

        )                                                                                                                                                                           

    [1] => Array                                                                                                                                                                    
        (                                                                                                                                                                           
            [From] => en                                                                                                                                                            
            [OriginalTextSentenceLengths] => Array                                                                                                                                  
                (                                                                                                                                                                   
                    [0] => 16                                                                                                                                                       
                )                                                                                                                                                                   

            [TranslatedText] => 你是最好的                                                                                                                                          
            [TranslatedTextSentenceLengths] => Array                                                                                                                                
                (                                                                                                                                                                   
                    [0] => 5                                                                                                                                                        
                )                                                                                                                                                                   

        )                                                                                                                                                                           

)
vicTROLLA
  • 1,554
  • 12
  • 15
  • Any idea how i troubleshoot then how the JSON i get back via cURL gives `JSON_ERROR_SYNTAX` but not when I run it against the JSON as-is? – sdolgy Aug 10 '11 at 15:39