1

I have string like this:

Óâàæàåìûé êëèåíò!

And want to decode it into cyrillic symbols. I'm already try to decode by mb_convert_encoding, but don't get the proper result.

$string = 'Óâàæàåìûé êëèåíò!';
$stringEncode = mb_detect_encoding($string);

$result = mb_convert_encoding($string, "CP1251", $stringEncode);

echo $result //????????? ??????!

//Case with auto detect encoding return the same result
$result = mb_convert_encoding($string, "CP1251");

echo $result //????????? ??????!

I'm try to use different Character Encodings but always get wrong result.

Proper result must be:

Уважаемый клиент!

Notice! I'm try to use online services for encoding current string and get the proper result. The string is not broken. It seems like PHP can't to define encoding and convert current string into cyrillic.

Thnx for any help!

UPD:

bin2hex output: c393c3a2c3a0c3a6c3a0c3a5c3acc3bbc3a920c3aac3abc3a8c3a5c3adc3b22120c387c3a0c3a2c3b2c3b0c3a020c3adc3a5c3aec3a1c3b5c3aec3a4c3a8c3acc3ae20c3a2c3adc3a5c3b1c3b2c3a820c3acc3a8c3adc3a8c3acc3a0c3abc3bcc3adc3bbc3a920c3afc3abc3a0c3b2c3a5c3a620c3afc3ae20c3a7c3a0c3a9c3acc3b320c3a220c3b0c3a0c3a7c3acc3a5c3b0c3a52036343120c3b0c3b3c3a1c3abc3a5c3a92e20c384c3abc3bf20c3aec3afc3abc3a0c3b2c3bb20c3a2c3aec3b1c3afc3aec3abc3bcc3a7c3b3c3a9c3b2c3a5c3b1c3bc20c3abc3a8c3b7c3adc3bbc3ac20c3aac3a0c3a1c3a8c3adc3a5c3b2c3aec3ac20707572652e636f6d2e7275

"Where does that string come from originally?" - originally I get the response from Api in json format, then I use utf8_encode (if I don`t use this function json_decode return null) and finally json_decode return me an array of data:

[
    'status'         => '1',
    'last_date'      => '15.05.2018 10:00:17',
    'last_timestamp' => '1526353217',
    'send_date'      => '15.05.2018 10:00:05',
    'send_timestamp' => '1526353205',
    'phone'          => '79270212817',
    'cost'           => '6.24',
    'sender_id'      => 'PURE',
    'status_name'    => 'Äîñòàâëåíî',
    'message'        => 'Óâàæàåìûé êëèåíò!'

];
Odin Thunder
  • 3,284
  • 2
  • 28
  • 47
  • What are the raw bytes of that string? `echo bin2hex($string)`. Where does that string come from originally? – deceze May 16 '18 at 10:32
  • update question – Odin Thunder May 16 '18 at 10:54
  • It sounds to me like that JSON is in some exotic encoding. Instead of doing `utf8_encode` and then trying to deal with the fallout, figure out what the encoding of the JSON is and convert it from that encoding to UTF-8 before json_decoding it. The best way to figure out the encoding is to ask the origin (any HTTP headers…?). Failing that, open it in a text editor using "Reopen using encoding…", or whatever your text editor calls it, trying different encodings until you have found one where the text looks as expected. – deceze May 16 '18 at 10:57
  • I will try to get origin encoding from Api response. Thnx a lot, I didn't dig in that way, I just copy string from json_decoded and try to convert it. PS: Don't understand the guys who just vote down without any explanation, what a sense ? – Odin Thunder May 16 '18 at 11:04
  • @deceze, how bin2hex help to understand this case? (I can use this for original Api response) – Odin Thunder May 16 '18 at 11:14
  • @deceze, thnx a lot, I fix it ))) – Odin Thunder May 16 '18 at 11:28
  • Text consists of bytes. How those bytes are interpreted results in the characters you see on screen. Just seeing the characters on screen you have no idea what the bytes are, and hence have no idea whether the bytes are wrong or the interpretation is wrong. So always look at the raw bytes when dealing with encoding issues. Here the sample showed that you're dealing with Unicode characters instead of misinterpreted CP-whatever bytes, so lead to the conclusion that some previous encoding step has already gone awry. – deceze May 16 '18 at 11:43

2 Answers2

3

According to deceze advice, I get encoding for my origin Api response (windows-1251). Than I rewrite my 'prepare to json_decode' code and get the proper result.

//Replace this:
$contents = utf8_encode($response);
//To this:
$contents = mb_convert_encoding($response, 'utf-8', 'windows-1251');

$result   = json_decode($contents);

Notice! That utf8_encode convert ISO-8859-1 to UTF-8 and if we pass data with another encoding (in my case it windows-1251) into this function, we will receive unsuspected result. Big thnx to @mulquin and @deceze to helped me find out this problem.

PS: Always check the encoding of the source data, don't repeat my mistakes :)

Odin Thunder
  • 3,284
  • 2
  • 28
  • 47
1

I think you may be right in your assumption that PHP can't handle some of the characters in this string. I did the following troubleshooting and haven't been able to find the problem. It seems mb_check_encoding returns that the conversion should be possible, but it doesn't work for some reason...

You may need to do manual conversion: PHP Convert Windows-1251 to UTF 8

<?php

$utf8_string = 'Óâàæàåìûé êëèåíò!';
$cp1251_string = 'Уважаемый клиент!';

$utf8_detect = mb_detect_encoding($utf8_string, 'UTF-8');
$cp1251_detect = mb_detect_encoding($cp1251_string, 'CP1251');

$utf8_to_cp1251_check = mb_check_encoding($utf8_string, $cp1251_detect);
$cp1251_to_utf8_check = mb_check_encoding($cp1251_string, $utf8_detect);

$utf8_to_cp1251 = mb_convert_encoding($utf8_string, $cp1251_detect);
$cp1251_to_utf8 = mb_convert_encoding($cp1251_string, $utf8_detect);

$utf8_to_cp1251_icon = iconv( "UTF-8","CP1251//TRANSLIT", $utf8_string);

var_dump($utf8_string);
var_dump($cp1251_string);

echo PHP_EOL;

var_dump($utf8_detect);
var_dump($cp1251_detect);

echo PHP_EOL;

var_dump($utf8_to_cp1251_check);
var_dump($cp1251_to_utf8_check);

echo PHP_EOL;

var_dump($utf8_to_cp1251);
var_dump($cp1251_to_utf8);

echo PHP_EOL;

var_dump($utf8_to_cp1251_icon);

Output

string(32) "Óâàæàåìûé êëèåíò!" 
string(32) "Уважаемый клиент!" 

string(5) "UTF-8" 
string(12) "Windows-1251" 

bool(true) 
bool(true) 

string(17) "????????? ??????!" 
string(32) "Уважаемый клиент!" 

string(18) "???ae????? ??????!"
Jacob Mulquin
  • 3,458
  • 1
  • 19
  • 22