5

Consider following URL: click here

There is some encoding into Japanese characters. Firefox browser on my PC is able to detect it automatically and show the characters. For Chrome, on the other hand, I have to change the encoding manually to "Shift_JIS" to see the japanese characters.

If I try to access the content via PHP-cURL, the encoded text appears garbled like this

���ϕi�̂��ƂȂ��I�݂��Ȃ̃N�`�R�~�T�C�g�������������i�A�b�g�R�X���j�ɂ��܂����I

I tried:

  curl_setopt($ch, CURLOPT_ENCODING, 'Shift_JIS');

I also tried (after downloading the curl response):

  $output_str = mb_convert_encoding($curl_response, 'Shift_JIS', 'auto');
  $output_str = mb_convert_encoding($curl_response, 'SJIS', 'auto');

But that does not work either.

Here is the full code

   curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
    curl_setopt($ch, CURLOPT_HTTPHEADER, array(
        'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
        'Accept-Language: en-US,en;q=0.5',
        'Connection: keep-alive'
    ));

    //curl_setopt($ch, CURLOPT_ENCODING, 'SJIS');
    curl_setopt($ch, CURLOPT_USERAGENT, $useragent);
    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 10);
    curl_setopt($ch, CURLOPT_TIMEOUT, 20);
    curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
    $response = curl_exec($ch);
Amar Pratap
  • 1,000
  • 7
  • 20
hvs
  • 518
  • 1
  • 5
  • 21

2 Answers2

6

That page doesn't return valid HTML, it's actually Javascript. If you fetch it with curl and output it, add header('Content-type: text/html; charset=shift_jis'); to your code and when you load it in Chrome the characters will display properly.

Since the HTML doesn't specify the character set, you can specify it from the server using header().

To actually convert the encoding so it will display properly in your terminal, you can try the following:

Use iconv() to convert to UTF-8

$curl_response = iconv('shift-jis', 'utf-8', $curl_response);

Use mb_convert_encoding() to convert to UTF-8

$curl_response = mb_convert_encoding($curl_response, 'utf-8', 'shift-jis');

Both of those methods worked for me and I was able to see Japanese characters displayed correctly on my terminal.

UTF-8 should be fine, but if you know your system is using something different, you can try that instead.

Hope that helps.

drew010
  • 68,777
  • 11
  • 134
  • 162
  • I am trying to access Japanese characters through php script. I cannot use browser. Is it possible? – hvs Mar 22 '16 at 05:36
  • @hvs Just use the `iconv()` method. I just tested it, works just fine. You can use it in a regular PHP script and not a webpage. – Anonymous Mar 26 '16 at 19:55
  • 1
    @hvs, `mb_convert_encoding()` should work as well. In your original code you used incorrect order of parameters to it. See http://php.net/manual/en/function.mb-convert-encoding.php – Ivan Yarych Mar 26 '16 at 19:59
  • Hi Sorry for getting back to you so late. I was caught up in other things and frankly did not expect an answer after waiting for long time. Your mb_convert_encoding suggestion works!! Many thanks. – hvs May 15 '16 at 17:20
0

The following code will output the Japanese characters correctly in the browser:-

<?php

// create a new cURL resource
$ch = curl_init();

// set URL and other appropriate options
curl_setopt($ch, CURLOPT_URL, $setUrlHere);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);

// grab URL content
$response = curl_exec($ch);

// close cURL resource, and free up system resources
curl_close($ch);

header('Content-type: text/html; charset=shift_jis');
echo $response;
Suleman C
  • 783
  • 4
  • 17
  • Cannot use browser since it is a php script running on server and writing straight into db. – hvs Mar 22 '16 at 05:37