17

Related questions:

  1. How to replace characters in a java String?
  2. How to replace special characters with their equivalent (such as " á " for " a") in C#?

As in the questions above, I'm looking for a reliable, robust way to reduce any unicode character to near-equivalent ASCII using PHP. I really want to avoid rolling my own look up table.

For example (stolen from 1st referenced question): Gračišće becomes Gracisce

Community
  • 1
  • 1
Dolph
  • 49,714
  • 13
  • 63
  • 88

5 Answers5

35

The iconv module can do this, more specifically, the iconv() function:

$str = iconv('Windows-1252', 'ASCII//TRANSLIT//IGNORE', "Gracišce");
echo $str;
//outputs "Gracisce"

The main hassle with iconv is that you just have to watch your encodings, but it's definitely the right tool for the job (I used 'Windows-1252' for the example due to limitations of the text editor I was working with ;) The feature of iconv that you definitely want to use is the //TRANSLIT flag, which tells iconv to transliterate any characters that don't have an ASCII match into the closest approximation.

zombat
  • 92,731
  • 24
  • 156
  • 164
  • 1
    Transliteration is now my word of the day. – Dolph Apr 16 '10 at 15:42
  • 4
    Note, this doesn't work properly when locale category `LC_CTYPE` is set to `C` or `POSIX` (you can check what your locale is with `echo setlocale(LC_ALL, 0);`). All non-ascii characters will be converted to `?`s. Instead you will need to set the locale to something else first: e.g. `setlocale(LC_ALL, "en_US.UTF-8")`. – Mike Jun 07 '13 at 06:05
  • @Mike thanks for your hint. If not for you, I might have never solved that problem. – Buttle Butkus Jul 18 '13 at 23:40
  • 1
    if the character is not found, a "?" replaces that special char. this should not be the most voted answer. it's misleading – machineaddict Apr 27 '16 at 12:26
4

I found another solution, based on @zombat's answer.

The issue with his answer was that I was getting:

Notice: iconv() [function.iconv]: Wrong charset, conversion from `UTF-8' to `ASCII//TRANSLIT//IGNORE' is not allowed in D:\www\phpcommand.php(11) : eval()'d code on line 3

And after removing //IGNORE from the function, I got:

Gr'a'e~a~o^O"ucisce

So, the š character was translated correctly, but the other characters weren't.

The solution that worked for me is a mix between preg_replace (to remove everything but [a-zA-Z0-9] - including spaces) and @zombat's solution:

preg_replace('/[^a-zA-Z0-9.]/','',iconv('UTF-8', 'ASCII//TRANSLIT', "GráéãõÔücišce"));

Output:

GraeaoOucisce
dmmd
  • 2,938
  • 4
  • 33
  • 41
2

My solution is to create two strings - first with not wanted letters and second with letters that will replace firsts.

$from = 'čšć';
$to   = 'csc';
$text = 'Gračišće';

$result = str_replace(str_split($from), str_split($to), $text);
hsz
  • 148,279
  • 62
  • 259
  • 315
2

Try this:

function normal_chars($string)
{
    $string = htmlentities($string, ENT_QUOTES, 'UTF-8');
    $string = preg_replace('~&([a-z]{1,2})(acute|cedil|circ|grave|lig|orn|ring|slash|th|tilde|uml);~i', '$1', $string);
    $string = preg_replace(array('~[^0-9a-z]~i', '~-+~'), ' ', $string);
    return trim($string);
}

Examples:

echo normal_chars('Álix----_Ãxel!?!?'); // Alix Axel
echo normal_chars('áéíóúÁÉÍÓÚ'); // aeiouAEIOU
echo normal_chars('üÿÄËÏÖÜŸåÅ'); // uyAEIOUYaA

Based on the selected answer in this thread: URL Friendly Username in PHP?

Community
  • 1
  • 1
John Conde
  • 217,595
  • 99
  • 455
  • 496
  • 2
    +1, but this only works for a subset of cases. For example, "Škoda" becomes "Scaron koda". – Dolph Jul 27 '10 at 21:41
1

You should also try:

transliterator_transliterate('Any-Latin; Latin-ASCII; Lower()', "ÀÖØöøįĴőŔžǍǰǴǵǸțȞȟȤȳɃɆɏ");

//Will output
aooooijorzajggnthhzybey

I found this from here: https://www.php.net/manual/en/transliterator.transliterate.php#111939