0

Using the functions strtr() or str_ireplace() or preg_replace() with array_walk_recursive(), I try to delete the bad character encoding in a multidimensional array, the data are encoding in UTF-8 and comes from a Curl query.

I want to remove the double encoding by keeping only the correctly encoded accented character:

ã©école => école

Array
(
    [0] => Array
        (
            [0] => ã©cole
            [1] => Array
                (
                    [0] => ã©ecole al inara avenue 2 mars casablanca
                    [1] => ã©ecole 42
                    [2] => grande ã©école
                )
        )
)

With PHP 7.2.6 i get an error with my code when I do this, is it a bad way to proceed?

Fatal error: Uncaught ArgumentCountError: Too few arguments to function

function fix_utf8(&$value, $key)
{
    $char = array('é','É','è','ê','ë','Ã','à¢','ù','î','ô','ë','ö','ü','à»','ç','à§','Å“','’','…','Å“','–','«','»','‚');
    $value =  str_ireplace($char, '', $value);
}

$result = array_walk_recursive($result, 'fix_utf8');

print_r($result);

OR

Fatal error: Uncaught ArgumentCountError: Too few arguments to function fix_utf8(), 1 passed in

function fix_utf8(&$value, $key)
{
    $char = array('é'=>'','É'=>'','è'=>'','ê'=>'','ë'=>'','Ã'=>'','à¢'=>'','ù'=>'','î'=>'','ô'=>'','ë'=>'','ö'=>'','ü'=>'','à»'=>'','ç'=>'','à§'=>'','Å“'=>'','’'=>'','…'=>'','Å“'=>'','–'=>'','«'=>'','»'=>'','‚'=>'');
    $value =  strtr(strtoupper($value), $char);
}

$result = array_walk_recursive($result, 'fix_utf8');

print_r($result);

OR

function fix_utf8(&$value, $key)
{
    $char = array('/é/','/É/','/è/','/ê/','/ë/','/Ã/','/à¢/','/ù/','/î/','/ô/','/ë/','/ö/','/ü/','/à»//','//ç/','/à§/','/Å“/','/’/','/…/','/Å“/','/–/','/«/','/»/','/‚/');    

    $value =  preg_replace($char, '', $value);
}

$result = array_walk_recursive($result, 'fix_utf8');

print_r($result);

Update:

Precision: the CuRL request retrieves content formatted in JSON and containing Unicode characters

["école",["école d\u0027ingénieur"]]
Sandra
  • 1,596
  • 15
  • 22

2 Answers2

0

Use iconv_mime_decode(); PHP predefine function to remove those characters

<?php
$str = "ã©cole ã©ecole al inara avenue 2 mars casablanca";
echo iconv_mime_decode($str);

DEMO

Siddhartha esunuri
  • 1,104
  • 1
  • 17
  • 29
0

The problem comes from a PHP 7.2.6 bug with Windows, the json_decode() function does not correctly convert Unicode characters included in JSON.

SOLUTION:

With Debian and PHP 7.0.30 the json_decode() function works correctly.

OR

You can write your own function that converts these Unicode escape sequences:

function unicodeString($str, $encoding=null) {
    if (is_null($encoding)) $encoding = ini_get('mbstring.internal_encoding');
    return preg_replace_callback('/\\\\u([0-9a-fA-F]{4})/u', function($match) use ($encoding) {
        return mb_convert_encoding(pack('H*', $match[1]), $encoding, 'UTF-16BE');
    }, $str);
}

Credit

OR

or make use of the direct mapping between UTF-16BE (big endian) and the Unicode codepoint:

echo mb_convert_encoding("\x10\x00", 'UTF-8', 'UTF-16BE');

Credit

Sandra
  • 1,596
  • 15
  • 22