0

I am trying to convert all of my UTF-8 characters to plain ASCII characters. I am looping trough every char of a string and based on the character I decide if character has to be changed. For ASCII chars it works fine but the code doesn't change UTF-8 characters.

here is my function:

    function toNoUTFChars($inputString){
    $stringArray = str_split($inputString);
    $finalString = '';
    foreach ($stringArray as $char) {
        if($char == 'ě' || $char == 'é'){$finalString .= 'e';
        }else if($char == 'š'){$finalString .= 's';
        }else if($char == 'č'){$finalString .= 'c';
        }else if($char == 'ř'){$finalString .= 'r';
        }else if($char == 'ý'){$finalString .= 'y';
        }else if($char == 'á'){$finalString .= 'a';
        }else if($char == 'í'){$finalString .= 'i';
        }else if($char == ' '){$finalString .= '-';
        }else if($char == 'ú' || $char == 'ů'){$finalString .= 'e';
        }else if($char == 'ň'){$finalString .= 'n';
        }else if($char == 'ť'){$finalString .= 't';
        }else if($char == 'ď'){$finalString .= 'd';
        }else if($char == 'ó'){$finalString .= 'o';
        }else if($char == 'ň'){$finalString .= 'n';
        }else if(ctype_alpha($char)){
            $finalString .= $char;
        }
       }
       return $finalString;
}

Example input "Test Outputěěěččč with utf8ččč"

Expected output: "Test-Outputeeeccc-with-utf8ccc"

Output i am getting: "Test-Output-with-utf8" //Utf8 chars missing :(

Evert
  • 93,428
  • 18
  • 118
  • 189
Jakub Menšík
  • 174
  • 1
  • 10
  • 2
    `str_split` doesn't work with multibyte characters. Might need [mb_str_split](https://www.php.net/manual/en/function.mb-str-split.php) – apokryfos Dec 03 '20 at 22:12
  • 1
    Consider using [Transliterator](https://www.php.net/manual/en/class.transliterator.php) class from intl extension, e.g. `$finalString = transliterator_transliterate('Any-Latin; Latin-ASCII', $inputString);` – Ruslan Osmanov Dec 03 '20 at 22:36

1 Answers1

1

Over the years I've experimented with a lot of things, but this is the only way it worked for me under all circumstances:

function remove_accents($txt) {
    $q = 'EOF'.mt_rand(100000000, 999999999);
    $q = "LC_CTYPE=en_US.utf8 iconv -f UTF-8 -t ASCII//TRANSLIT <<$q\n$txt\n$q";
    return substr(`$q`, 0, -1);
}

It's not very efficient because it runs the iconv binary.

soger
  • 1,147
  • 10
  • 18
  • Why would you not just use PHP's built-in iconv functions to do the same thing? – miken32 Dec 04 '20 at 02:20
  • @miken32 This remove_accents() function contained the built-in iconv() function call for a while,but it didn't work correctly sometimes. This was several years ago, so honestly I can't remember the details. Then, for a while it was just the iconv binary call and then again some string could not be converted, that's when I added the LC_CTYPE setting. If you're worried about performance you can try the iconv() call, the parameters are the same as the -f and -t switches but I recommend you monitor your string conversions especially if you use the function on strings received from unknown sources. – soger Dec 04 '20 at 13:13