115

I'm attempting to remove accents from characters in PHP string as the first step to making the string usable in a URL.

I'm using the following code:

$input = "Fóø Bår";

setlocale(LC_ALL, "en_US.utf8");
$output = iconv("utf-8", "ascii//TRANSLIT", $input);

print($output);

The output I would expect would be something like this:

F'oo Bar

However, instead of the accented characters being transliterated they are replaced with question marks:

F?? B?r

Everything I can find online indicates that setting the locale will fix this problem, however I'm already doing this. I've already checked the following details:

  1. The locale I am setting is supported by the server (included in the list produced by locale -a)
  2. The source and target encodings (UTF-8 and ASCII) are supported by the server's version of iconv (included in the list produced by iconv -l)
  3. The input string is UTF-8 encoded (verified using PHP's mb_check_encoding function, as suggested in the answer by mercator)
  4. The call to setlocale is successful (it returns 'en_US.utf8' rather than FALSE)

The cause of the problem:

The server is using the wrong implementation of iconv. It has the glibc version instead of the required libiconv version.

Note that the iconv function on some systems may not work as you expect. In such case, it'd be a good idea to install the GNU libiconv library. It will most likely end up with more consistent results.
PHP manual's introduction to iconv

Details about the iconv implementation that is used by PHP are included in the output of the phpinfo function.

(I'm not able to re-compile PHP with the correct iconv library on the server I'm working with for this project so the answer I've accepted below is the one that was most useful for removing accents without iconv support.)

Community
  • 1
  • 1
georgebrock
  • 28,393
  • 13
  • 77
  • 72
  • Note that if you're running this on a string that can't be ASCII, this will have dramatic effects. For example a Russian string won't work with ASCII. – Yvan Oct 04 '11 at 12:01
  • 1
    I have the glibc version install and setting the locale works for me. –  Apr 02 '12 at 19:47
  • So you had to compile it? I can't find a deb package anywhere. Exactly coz of the reason that "IT'S" in glibc already :-( – sumid May 10 '13 at 19:04
  • 1
    This guy suggests a clever solution using htmlentities(). Sorry it's in French, but you just need the small functions at the bottom of the doc: http://www.weirdog.com/blog/php/supprimer-les-accents-des-caracteres-accentues.html Really clever :) – JFG Dec 10 '13 at 20:35
  • 1
    For how want to see the code of which @JFG speak about, you can also found it here: https://github.com/ICanBoogie/Common/blob/ec90b2d854a49882c814c84f67ed54bbb566aac0/lib/helpers.php#L139 – mems Dec 03 '14 at 17:46
  • `utilphp/php::remove_accents('Àccent') # => Accent ` - http://brandonwamboldt.github.io/utilphp/#remove_accents – Bananaapple Feb 06 '19 at 11:10

31 Answers31

125

What about the WordPress implementation?

function remove_accents($string) {
    if ( !preg_match('/[\x80-\xff]/', $string) )
        return $string;

    $chars = array(
    // Decompositions for Latin-1 Supplement
    chr(195).chr(128) => 'A', chr(195).chr(129) => 'A',
    chr(195).chr(130) => 'A', chr(195).chr(131) => 'A',
    chr(195).chr(132) => 'A', chr(195).chr(133) => 'A',
    chr(195).chr(135) => 'C', chr(195).chr(136) => 'E',
    chr(195).chr(137) => 'E', chr(195).chr(138) => 'E',
    chr(195).chr(139) => 'E', chr(195).chr(140) => 'I',
    chr(195).chr(141) => 'I', chr(195).chr(142) => 'I',
    chr(195).chr(143) => 'I', chr(195).chr(145) => 'N',
    chr(195).chr(146) => 'O', chr(195).chr(147) => 'O',
    chr(195).chr(148) => 'O', chr(195).chr(149) => 'O',
    chr(195).chr(150) => 'O', chr(195).chr(153) => 'U',
    chr(195).chr(154) => 'U', chr(195).chr(155) => 'U',
    chr(195).chr(156) => 'U', chr(195).chr(157) => 'Y',
    chr(195).chr(159) => 's', chr(195).chr(160) => 'a',
    chr(195).chr(161) => 'a', chr(195).chr(162) => 'a',
    chr(195).chr(163) => 'a', chr(195).chr(164) => 'a',
    chr(195).chr(165) => 'a', chr(195).chr(167) => 'c',
    chr(195).chr(168) => 'e', chr(195).chr(169) => 'e',
    chr(195).chr(170) => 'e', chr(195).chr(171) => 'e',
    chr(195).chr(172) => 'i', chr(195).chr(173) => 'i',
    chr(195).chr(174) => 'i', chr(195).chr(175) => 'i',
    chr(195).chr(177) => 'n', chr(195).chr(178) => 'o',
    chr(195).chr(179) => 'o', chr(195).chr(180) => 'o',
    chr(195).chr(181) => 'o', chr(195).chr(182) => 'o',
    chr(195).chr(182) => 'o', chr(195).chr(185) => 'u',
    chr(195).chr(186) => 'u', chr(195).chr(187) => 'u',
    chr(195).chr(188) => 'u', chr(195).chr(189) => 'y',
    chr(195).chr(191) => 'y',
    // Decompositions for Latin Extended-A
    chr(196).chr(128) => 'A', chr(196).chr(129) => 'a',
    chr(196).chr(130) => 'A', chr(196).chr(131) => 'a',
    chr(196).chr(132) => 'A', chr(196).chr(133) => 'a',
    chr(196).chr(134) => 'C', chr(196).chr(135) => 'c',
    chr(196).chr(136) => 'C', chr(196).chr(137) => 'c',
    chr(196).chr(138) => 'C', chr(196).chr(139) => 'c',
    chr(196).chr(140) => 'C', chr(196).chr(141) => 'c',
    chr(196).chr(142) => 'D', chr(196).chr(143) => 'd',
    chr(196).chr(144) => 'D', chr(196).chr(145) => 'd',
    chr(196).chr(146) => 'E', chr(196).chr(147) => 'e',
    chr(196).chr(148) => 'E', chr(196).chr(149) => 'e',
    chr(196).chr(150) => 'E', chr(196).chr(151) => 'e',
    chr(196).chr(152) => 'E', chr(196).chr(153) => 'e',
    chr(196).chr(154) => 'E', chr(196).chr(155) => 'e',
    chr(196).chr(156) => 'G', chr(196).chr(157) => 'g',
    chr(196).chr(158) => 'G', chr(196).chr(159) => 'g',
    chr(196).chr(160) => 'G', chr(196).chr(161) => 'g',
    chr(196).chr(162) => 'G', chr(196).chr(163) => 'g',
    chr(196).chr(164) => 'H', chr(196).chr(165) => 'h',
    chr(196).chr(166) => 'H', chr(196).chr(167) => 'h',
    chr(196).chr(168) => 'I', chr(196).chr(169) => 'i',
    chr(196).chr(170) => 'I', chr(196).chr(171) => 'i',
    chr(196).chr(172) => 'I', chr(196).chr(173) => 'i',
    chr(196).chr(174) => 'I', chr(196).chr(175) => 'i',
    chr(196).chr(176) => 'I', chr(196).chr(177) => 'i',
    chr(196).chr(178) => 'IJ',chr(196).chr(179) => 'ij',
    chr(196).chr(180) => 'J', chr(196).chr(181) => 'j',
    chr(196).chr(182) => 'K', chr(196).chr(183) => 'k',
    chr(196).chr(184) => 'k', chr(196).chr(185) => 'L',
    chr(196).chr(186) => 'l', chr(196).chr(187) => 'L',
    chr(196).chr(188) => 'l', chr(196).chr(189) => 'L',
    chr(196).chr(190) => 'l', chr(196).chr(191) => 'L',
    chr(197).chr(128) => 'l', chr(197).chr(129) => 'L',
    chr(197).chr(130) => 'l', chr(197).chr(131) => 'N',
    chr(197).chr(132) => 'n', chr(197).chr(133) => 'N',
    chr(197).chr(134) => 'n', chr(197).chr(135) => 'N',
    chr(197).chr(136) => 'n', chr(197).chr(137) => 'N',
    chr(197).chr(138) => 'n', chr(197).chr(139) => 'N',
    chr(197).chr(140) => 'O', chr(197).chr(141) => 'o',
    chr(197).chr(142) => 'O', chr(197).chr(143) => 'o',
    chr(197).chr(144) => 'O', chr(197).chr(145) => 'o',
    chr(197).chr(146) => 'OE',chr(197).chr(147) => 'oe',
    chr(197).chr(148) => 'R',chr(197).chr(149) => 'r',
    chr(197).chr(150) => 'R',chr(197).chr(151) => 'r',
    chr(197).chr(152) => 'R',chr(197).chr(153) => 'r',
    chr(197).chr(154) => 'S',chr(197).chr(155) => 's',
    chr(197).chr(156) => 'S',chr(197).chr(157) => 's',
    chr(197).chr(158) => 'S',chr(197).chr(159) => 's',
    chr(197).chr(160) => 'S', chr(197).chr(161) => 's',
    chr(197).chr(162) => 'T', chr(197).chr(163) => 't',
    chr(197).chr(164) => 'T', chr(197).chr(165) => 't',
    chr(197).chr(166) => 'T', chr(197).chr(167) => 't',
    chr(197).chr(168) => 'U', chr(197).chr(169) => 'u',
    chr(197).chr(170) => 'U', chr(197).chr(171) => 'u',
    chr(197).chr(172) => 'U', chr(197).chr(173) => 'u',
    chr(197).chr(174) => 'U', chr(197).chr(175) => 'u',
    chr(197).chr(176) => 'U', chr(197).chr(177) => 'u',
    chr(197).chr(178) => 'U', chr(197).chr(179) => 'u',
    chr(197).chr(180) => 'W', chr(197).chr(181) => 'w',
    chr(197).chr(182) => 'Y', chr(197).chr(183) => 'y',
    chr(197).chr(184) => 'Y', chr(197).chr(185) => 'Z',
    chr(197).chr(186) => 'z', chr(197).chr(187) => 'Z',
    chr(197).chr(188) => 'z', chr(197).chr(189) => 'Z',
    chr(197).chr(190) => 'z', chr(197).chr(191) => 's'
    );

    $string = strtr($string, $chars);

    return $string;
}

To understand what this function does, check the conversion table:

À => A
Á => A
 => A
à => A
Ä => A
Å => A
Ç => C
È => E
É => E
Ê => E
Ë => E
Ì => I
Í => I
Î => I
Ï => I
Ñ => N
Ò => O
Ó => O
Ô => O
Õ => O
Ö => O
Ù => U
Ú => U
Û => U
Ü => U
Ý => Y
ß => s
à => a
á => a
â => a
ã => a
ä => a
å => a
ç => c
è => e
é => e
ê => e
ë => e
ì => i
í => i
î => i
ï => i
ñ => n
ò => o
ó => o
ô => o
õ => o
ö => o
ù => u
ú => u
û => u
ü => u
ý => y
ÿ => y
Ā => A
ā => a
Ă => A
ă => a
Ą => A
ą => a
Ć => C
ć => c
Ĉ => C
ĉ => c
Ċ => C
ċ => c
Č => C
č => c
Ď => D
ď => d
Đ => D
đ => d
Ē => E
ē => e
Ĕ => E
ĕ => e
Ė => E
ė => e
Ę => E
ę => e
Ě => E
ě => e
Ĝ => G
ĝ => g
Ğ => G
ğ => g
Ġ => G
ġ => g
Ģ => G
ģ => g
Ĥ => H
ĥ => h
Ħ => H
ħ => h
Ĩ => I
ĩ => i
Ī => I
ī => i
Ĭ => I
ĭ => i
Į => I
į => i
İ => I
ı => i
IJ => IJ
ij => ij
Ĵ => J
ĵ => j
Ķ => K
ķ => k
ĸ => k
Ĺ => L
ĺ => l
Ļ => L
ļ => l
Ľ => L
ľ => l
Ŀ => L
ŀ => l
Ł => L
ł => l
Ń => N
ń => n
Ņ => N
ņ => n
Ň => N
ň => n
ʼn => N
Ŋ => n
ŋ => N
Ō => O
ō => o
Ŏ => O
ŏ => o
Ő => O
ő => o
Œ => OE
œ => oe
Ŕ => R
ŕ => r
Ŗ => R
ŗ => r
Ř => R
ř => r
Ś => S
ś => s
Ŝ => S
ŝ => s
Ş => S
ş => s
Š => S
š => s
Ţ => T
ţ => t
Ť => T
ť => t
Ŧ => T
ŧ => t
Ũ => U
ũ => u
Ū => U
ū => u
Ŭ => U
ŭ => u
Ů => U
ů => u
Ű => U
ű => u
Ų => U
ų => u
Ŵ => W
ŵ => w
Ŷ => Y
ŷ => y
Ÿ => Y
Ź => Z
ź => z
Ż => Z
ż => z
Ž => Z
ž => z
ſ => s

You can generate the conversion table yourself by simply iterating over the $chars array of the function:

foreach($chars as $k=>$v) {
   printf("%s -> %s", $k, $v);
}
8ctopus
  • 2,617
  • 2
  • 18
  • 25
dynamic
  • 46,985
  • 55
  • 154
  • 231
  • 5
    This should have been the accepted answer, since it was implemented in a safer way (using chr() function) instead of hard-coding accented characters, which might get overwritten in some text-editors. – Mladen B. Sep 17 '14 at 04:59
  • 2
    Note that the new implementation is now located here : https://core.trac.wordpress.org/browser/tags/4.1/src/wp-includes/formatting.php#L822 . The code requires other functions provided par wordpress core, and can not be copy/paste without some (light) work. – Julien Fastré Dec 30 '14 at 22:54
  • The only snippet that worked for me for file upload normalization. Thanks for sharing and +1. – Giorgio Feb 28 '15 at 08:45
  • 1
    Working as a charm, THANK YOU! – Salvi Pascual May 02 '16 at 18:05
  • this is a very incomplete implementation, see John R's reply – user151496 May 24 '16 at 09:14
  • 5
    It should be noted that WordPress is GPLv2 licensed. – Diogo Kollross Jul 11 '16 at 23:00
  • What are you talking about? This is a copy-paste code and have nothing to do with wordpress. Just implemented in 1 second – erdomester Aug 10 '16 at 15:41
  • Marvelous. Thank you. – Zariweya Oct 17 '16 at 11:01
  • 1
    This link is up to date: https://github.com/WordPress/WordPress/blob/master/wp-includes/formatting.php#L1596 – online Thomas Sep 12 '18 at 10:19
  • Sometimes accents are encoded as separated letters as "diacritics marks". To remove those, use also `preg_replace('/\p{M}/u',"", $txt);` – Vincent Fourmond Jan 23 '20 at 12:46
  • As of php 7.4 the initial if statement does not detect some Czech letters, for example words like `PSČ` or `MĚSTO` are not recognized. After removing the optimization if statement everything works fine. – Hhyperion Mar 02 '22 at 10:17
59

UTF-8 friendly version of the simple function posted above by Gino:

function stripAccents($str) {
    return strtr(utf8_decode($str), utf8_decode('àáâãäçèéêëìíîïñòóôõöùúûüýÿÀÁÂÃÄÇÈÉÊËÌÍÎÏÑÒÓÔÕÖÙÚÛÜÝ'), 'aaaaaceeeeiiiinooooouuuuyyAAAAACEEEEIIIINOOOOOUUUUY');
}

Had to come to this because my php document was UTF-8 encoded.

Hope it helps.

  • 1
    Certain characters in UTF8 do not work properly for me using this function. I believe that's due to utf8_decode(), which converts from UTF8 to ISO-8859-1. – Trevor Gehman Apr 07 '14 at 00:26
  • @trevor-gehman: strtr() only works on single-byte characters, hence those in Unicode [Latin-1 Supplement](http://unicode-table.com/en/sections/latin-1-supplement). – ChrisV Jun 16 '14 at 17:07
  • 1
    Strtr works fine for replacing multi byte UTF8 characters, but you need to use the variant where you supply an associative array as the second argument. Then you can have a multi byte character as the key or value in any position of that array. Make sure your text editor is in UTF8 mode or encode using "\xc2\x81" type syntax. – thomasrutter May 13 '16 at 09:09
  • this is a very incomplete implementation, see John R's reply – user151496 May 24 '16 at 09:14
  • 1
    Note that `utf8_decode` is deprecated since PHP 8.2.0, and should not be used anymore. – Alexis Jan 25 '23 at 10:28
50

This is a piece of code I found and use often:

function stripAccents($stripAccents){
  return strtr($stripAccents,'àáâãäçèéêëìíîïñòóôõöùúûüýÿÀÁÂÃÄÇÈÉÊËÌÍÎÏÑÒÓÔÕÖÙÚÛÜÝ','aaaaaceeeeiiiinooooouuuuyyAAAAACEEEEIIIINOOOOOUUUUY');
}
Lucas
  • 16,930
  • 31
  • 110
  • 182
Gino
  • 1,834
  • 2
  • 19
  • 20
  • 17
    As `strtr()` isn't multibyte aware, if your script file is encoded in a multibyte format (e.g. UTF-8) this function produces wrong results. – Gras Double Jun 26 '14 at 04:26
  • True. I tried to fork a github project just to edit a single line, not even related to accented chars, but when I saved the changes and created a pull request, it included the additional changes on all the lines that had accented chars hard-coded. The safer way is to use chr(). – Mladen B. Sep 17 '14 at 04:50
  • 2
    ...additionally - this list does not contain **many** accented characters, e.g. `ů, ž, ř, č, ...` – jave.web Apr 24 '17 at 01:10
  • Señor, Bjørk, [Ł](https://en.wikipedia.org/wiki/Ł), etc. This is a list that is far, far longer than what's here. – tadman Jan 05 '18 at 17:56
47

if you have http://php.net/manual/en/book.intl.php available, this solved your problem

$string = "Fóø Bår";
$transliterator = Transliterator::createFromRules(':: Any-Latin; :: Latin-ASCII; :: NFD; :: [:Nonspacing Mark:] Remove; :: Lower(); :: NFC;', Transliterator::FORWARD);
echo $normalized = $transliterator->transliterate($string);
gabo
  • 1,538
  • 14
  • 15
  • 2
    `Lower()` is not required in this case – Rey0bs Nov 13 '17 at 16:49
  • These links might help figuring out rules and what NFD and NFC means: [ICU User guide](http://userguide.icu-project.org/transforms/general#TOC-General) and [Unicode Norm Forms](http://www.unicode.org/reports/tr15/#Norm_Forms) – Zedzdead Aug 23 '18 at 05:37
  • 3
    I've read this page like a dozen times and somehow until today I missed this answer which is perfect! – sylbru Oct 15 '18 at 15:36
  • 7
    Can't believe the most upvoted answers are about hardcoding character maps. **This** should be the accepted answer. – BenMorel Nov 14 '19 at 18:19
  • This answer led me to the right solution for my needs. The rules here are too strong, I used only those rules to remove accents `':: NFD; :: [:Nonspacing Mark:] Remove; :: NFC;'` – Peeter Rannou - thedotwriter Oct 08 '20 at 13:20
  • More about it here: https://www.php.net/manual/en/transliterator.transliterate.php#111939 – Szabolcs Páll Jul 07 '21 at 11:56
  • Much better solution than iconv with ASCII//TRANSLIT I was using until then (iconv requires a setlocale call first) – Pierre-Olivier Vares Aug 03 '21 at 12:15
  • 1
    Note: it requires `php-intl` module to be installed and enabled in config. Make sure you have it available if you plan to use this solution. – Filip Happy Jun 07 '22 at 18:16
24

The easiest way is to use iconv() PHP native function.

 echo iconv('UTF-8', 'ASCII//TRANSLIT//IGNORE', "Thîs îs à vêry wrong séntènce!");

 // output: This is a very wrong sentence!
Waiyl Karim
  • 2,810
  • 21
  • 25
  • 3
    This is not very reliable `echo iconv('UTF-8', 'ASCII//TRANSLIT//IGNORE', 'usuario o contraseña incorrectos');` outputs `usuario o contrase?a incorrectos` – Stan Dec 08 '14 at 16:59
  • 3
    For your example you could use: `setlocale(LC_CTYPE, 'cs_CZ'); echo iconv('UTF-8', 'ASCII//TRANSLIT', "usuario o contraseña incorrectos"); // output: usuario o contrasena incorrectos`. Please refer to PHP Documentation for more info. Everything is there! http://php.net/manual/en/function.iconv.php – Waiyl Karim Dec 08 '14 at 21:59
  • 2
    The iconv UTF8 to ASCII transliterations seem to be very strange. I get "usuario o contrase~na incorrectos" for my locale. It converts things like ñ to ~n and ö to :o I have found that transliterating to latin1 then converting extended ascii manually works best. – Phil Jan 27 '15 at 23:40
  • @Phil_1984_ What's your locale? – mpen Jul 28 '15 at 17:17
  • @Mark Just realised my setlocale command was returning false on my windows. It should be using "en_UK" tho. Maybe a windows bug with my iconv. – Phil Jul 28 '15 at 22:35
  • @Stan looks like your ñ character isn't proper utf-8. Check your .php file encoding. – Phil Jul 28 '15 at 22:35
  • @Phil_1984_ `setlocale(LC_CTYPE,'en_UK');echo iconv('UTF-8', 'ASCII//TRANSLIT//IGNORE', "usuario o contraseña incorrectos");` comes out correctly. Do you know how to reliably reproduce the issue? – mpen Jul 28 '15 at 22:52
  • @Phil It's not en_UK but en_GB. UK is not the country code of the UK, the TLD .uk is just an exception. The traditional and official name here is still GB for Great Britain. – ygoe Apr 19 '23 at 20:16
17

When using iconv, the parameter locale must be set:

function test_enc($text = 'ěščřžýáíé ĚŠČŘŽÝÁÍÉ fóø bår FÓØ BÅR æ')
{
    echo '<tt>';
    echo iconv('utf8', 'ascii//TRANSLIT', $text);
    echo '</tt><br/>';
} 

test_enc();
setlocale(LC_ALL, 'cs_CZ.utf8');
test_enc();
setlocale(LC_ALL, 'en_US.utf8');
test_enc();

Yields into:

????????? ????????? f?? b?r F?? B?R ae
escrzyaie ESCRZYAIE fo? bar FO? BAR ae
escrzyaie ESCRZYAIE fo? bar FO? BAR ae

Another locales then cs_CZ and en_US I haven't installed and I can't test it.

In C# I see solution using translation to unicode normalized form - accents are splitted out and then filtered via nonspacing unicode category.

langpavel
  • 1,270
  • 12
  • 16
11

Indeed is a matter of taste. There are many flavors for converting such letters.

function replaceAccents($str)
{
  $a = array('À', 'Á', 'Â', 'Ã', 'Ä', 'Å', 'Æ', 'Ç', 'È', 'É', 'Ê', 'Ë', 'Ì', 'Í', 'Î', 'Ï', 'Ð', 'Ñ', 'Ò', 'Ó', 'Ô', 'Õ', 'Ö', 'Ø', 'Ù', 'Ú', 'Û', 'Ü', 'Ý', 'ß', 'à', 'á', 'â', 'ã', 'ä', 'å', 'æ', 'ç', 'è', 'é', 'ê', 'ë', 'ì', 'í', 'î', 'ï', 'ñ', 'ò', 'ó', 'ô', 'õ', 'ö', 'ø', 'ù', 'ú', 'û', 'ü', 'ý', 'ÿ', 'Ā', 'ā', 'Ă', 'ă', 'Ą', 'ą', 'Ć', 'ć', 'Ĉ', 'ĉ', 'Ċ', 'ċ', 'Č', 'č', 'Ď', 'ď', 'Đ', 'đ', 'Ē', 'ē', 'Ĕ', 'ĕ', 'Ė', 'ė', 'Ę', 'ę', 'Ě', 'ě', 'Ĝ', 'ĝ', 'Ğ', 'ğ', 'Ġ', 'ġ', 'Ģ', 'ģ', 'Ĥ', 'ĥ', 'Ħ', 'ħ', 'Ĩ', 'ĩ', 'Ī', 'ī', 'Ĭ', 'ĭ', 'Į', 'į', 'İ', 'ı', 'IJ', 'ij', 'Ĵ', 'ĵ', 'Ķ', 'ķ', 'Ĺ', 'ĺ', 'Ļ', 'ļ', 'Ľ', 'ľ', 'Ŀ', 'ŀ', 'Ł', 'ł', 'Ń', 'ń', 'Ņ', 'ņ', 'Ň', 'ň', 'ʼn', 'Ō', 'ō', 'Ŏ', 'ŏ', 'Ő', 'ő', 'Œ', 'œ', 'Ŕ', 'ŕ', 'Ŗ', 'ŗ', 'Ř', 'ř', 'Ś', 'ś', 'Ŝ', 'ŝ', 'Ş', 'ş', 'Š', 'š', 'Ţ', 'ţ', 'Ť', 'ť', 'Ŧ', 'ŧ', 'Ũ', 'ũ', 'Ū', 'ū', 'Ŭ', 'ŭ', 'Ů', 'ů', 'Ű', 'ű', 'Ų', 'ų', 'Ŵ', 'ŵ', 'Ŷ', 'ŷ', 'Ÿ', 'Ź', 'ź', 'Ż', 'ż', 'Ž', 'ž', 'ſ', 'ƒ', 'Ơ', 'ơ', 'Ư', 'ư', 'Ǎ', 'ǎ', 'Ǐ', 'ǐ', 'Ǒ', 'ǒ', 'Ǔ', 'ǔ', 'Ǖ', 'ǖ', 'Ǘ', 'ǘ', 'Ǚ', 'ǚ', 'Ǜ', 'ǜ', 'Ǻ', 'ǻ', 'Ǽ', 'ǽ', 'Ǿ', 'ǿ');
  $b = array('A', 'A', 'A', 'A', 'A', 'A', 'AE', 'C', 'E', 'E', 'E', 'E', 'I', 'I', 'I', 'I', 'D', 'N', 'O', 'O', 'O', 'O', 'O', 'O', 'U', 'U', 'U', 'U', 'Y', 's', 'a', 'a', 'a', 'a', 'a', 'a', 'ae', 'c', 'e', 'e', 'e', 'e', 'i', 'i', 'i', 'i', 'n', 'o', 'o', 'o', 'o', 'o', 'o', 'u', 'u', 'u', 'u', 'y', 'y', 'A', 'a', 'A', 'a', 'A', 'a', 'C', 'c', 'C', 'c', 'C', 'c', 'C', 'c', 'D', 'd', 'D', 'd', 'E', 'e', 'E', 'e', 'E', 'e', 'E', 'e', 'E', 'e', 'G', 'g', 'G', 'g', 'G', 'g', 'G', 'g', 'H', 'h', 'H', 'h', 'I', 'i', 'I', 'i', 'I', 'i', 'I', 'i', 'I', 'i', 'IJ', 'ij', 'J', 'j', 'K', 'k', 'L', 'l', 'L', 'l', 'L', 'l', 'L', 'l', 'l', 'l', 'N', 'n', 'N', 'n', 'N', 'n', 'n', 'O', 'o', 'O', 'o', 'O', 'o', 'OE', 'oe', 'R', 'r', 'R', 'r', 'R', 'r', 'S', 's', 'S', 's', 'S', 's', 'S', 's', 'T', 't', 'T', 't', 'T', 't', 'U', 'u', 'U', 'u', 'U', 'u', 'U', 'u', 'U', 'u', 'U', 'u', 'W', 'w', 'Y', 'y', 'Y', 'Z', 'z', 'Z', 'z', 'Z', 'z', 's', 'f', 'O', 'o', 'U', 'u', 'A', 'a', 'I', 'i', 'O', 'o', 'U', 'u', 'U', 'u', 'U', 'u', 'U', 'u', 'U', 'u', 'A', 'a', 'AE', 'ae', 'O', 'o');
  return str_replace($a, $b, $str);
}
Junior Mayhé
  • 16,144
  • 26
  • 115
  • 161
9

I think the problem here is that your encodings consider ä and å different symbols to 'a'. In fact, the PHP documentation for strtr offers a sample for removing accents the ugly way :(

http://ie2.php.net/strtr

Jeremy Smyth
  • 23,270
  • 2
  • 52
  • 65
  • 2
    I think you should probably suggest mb_strstr() instead, as his input is UTF8 – karim79 Jun 19 '09 at 12:23
  • 3
    The //TRANSLIT in the iconv call is meant to convert to the nearest available alternative in the target encoding. This should include removing accents, or converting a single character into two, e.g. ñ might become n~ – georgebrock Jun 19 '09 at 12:23
  • Since the server doesn't support iconv properly, looks like I'll be doing it this way afterall. Thanks Jeremy. – georgebrock Jun 19 '09 at 16:47
  • 3
    @karim79 `mb_strstr` is the wrong function, and there is no `mb_strtr` – philfreo Nov 10 '10 at 19:33
7

here is a simple function that i use usually to remove accents :

function str_without_accents($str, $charset='utf-8')
{
    $str = htmlentities($str, ENT_NOQUOTES, $charset);

    $str = preg_replace('#&([A-za-z])(?:acute|cedil|caron|circ|grave|orn|ring|slash|th|tilde|uml);#', '\1', $str);
    $str = preg_replace('#&([A-za-z]{2})(?:lig);#', '\1', $str); // pour les ligatures e.g. '&oelig;'
    $str = preg_replace('#&[^;]+;#', '', $str); // supprime les autres caractères

    return $str;   // or add this : mb_strtoupper($str); for uppercase :)
}
Mimouni
  • 3,564
  • 3
  • 28
  • 37
  • This does not seem very future-proof. If further HTML entities are allocated in future then they might coincidentally match your regex (lots of words end in "ring", for example). – equin0x80 Apr 21 '20 at 15:51
6

You could use urlencode. Does not quite do what you want (remove accents), but will give you a url usable string

$output = urlencode ($input);

In Perl I could use a translate regex, but I cannot think of the PHP equivalent

$input =~ tr/áâàå/aaaa/;

etc...

you could do this using preg_replace

$patterns[0] = '/[á|â|à|å|ä]/';
$patterns[1] = '/[ð|é|ê|è|ë]/';
$patterns[2] = '/[í|î|ì|ï]/';
$patterns[3] = '/[ó|ô|ò|ø|õ|ö]/';
$patterns[4] = '/[ú|û|ù|ü]/';
$patterns[5] = '/æ/';
$patterns[6] = '/ç/';
$patterns[7] = '/ß/';
$replacements[0] = 'a';
$replacements[1] = 'e';
$replacements[2] = 'i';
$replacements[3] = 'o';
$replacements[4] = 'u';
$replacements[5] = 'ae';
$replacements[6] = 'c';
$replacements[7] = 'ss';

$output = preg_replace($patterns, $replacements, $input);

(Please note this was typed from a foggy beer ridden Friday after noon memory, so may not be 100% correct)

or you could make a hash table and do a replacement based off of that.

Xetius
  • 44,755
  • 24
  • 88
  • 123
4

I just created a removeAccents method based on the reading of this thread and this other one too (How to remove accents and turn letters into "plain" ASCII characters?).

The method is here: https://github.com/lingtalfi/Bat/blob/master/StringTool.md#removeaccents

Tests are here: https://github.com/lingtalfi/Bat/blob/master/btests/StringTool/removeAccents/stringTool.removeAccents.test.php,

and here is what was tested so far:

$a = [
    // easy
    '',
    'a',
    'après',
    'dédé fait la fête ?',
    // hard
    'àáâãäçèéêëìíîïñòóôõöùúûüýÿÀÁÂÃÄÇÈÉÊËÌÍÎÏÑÒÓÔÕÖÙÚÛÜÝ',
    'ŻŹĆŃĄŚŁĘÓżźćńąśłęó',
    'qqqqŻŹĆŃĄŚŁĘÓżźćńąśłęóqqq',
    'ŠŽšžŸÀÁÂÃÄÅÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝàáâãäåçèéêëìíîïðñòóôõöøùúûüýÿ',       
    'ÀÁÂÃÄÅÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝàáâãäåçèéêëìíîïñòóôõöøùúûüýÿ',
    'ĀāĂ㥹ĆćĈĉĊċČčĎďĐđĒēĔĕĖėĘęĚěĜĝĞğĠġĢģĤĥĦħĨĩĪīĬĭĮįİĴĵĶķ',
    'ĹĺĻļĽľĿŀŁłŃńŅņŇňʼnŌōŎŏŐőŔŕŖŗŘřŚśŜŝŞşŠšŢţŤťŦŧŨũŪūŬŭŮůŰűŲųŴŵŶŷŸŹźŻżŽž',
    'ſƒƠơƯưǍǎǏǐǑǒǓǔǕǖǗǘǙǚǛǜǺǻǾǿ',
    'Ǽǽ',
];

and it converts only accentuated things (letters/ligatures/cédilles/some letters with a line through/...?).

Here is the content of the method: (https://github.com/lingtalfi/Bat/blob/master/StringTool.php#L83)

public static function removeAccents($str)
{
    static $map = [
        // single letters
        'à' => 'a',
        'á' => 'a',
        'â' => 'a',
        'ã' => 'a',
        'ä' => 'a',
        'ą' => 'a',
        'å' => 'a',
        'ā' => 'a',
        'ă' => 'a',
        'ǎ' => 'a',
        'ǻ' => 'a',
        'À' => 'A',
        'Á' => 'A',
        'Â' => 'A',
        'Ã' => 'A',
        'Ä' => 'A',
        'Ą' => 'A',
        'Å' => 'A',
        'Ā' => 'A',
        'Ă' => 'A',
        'Ǎ' => 'A',
        'Ǻ' => 'A',


        'ç' => 'c',
        'ć' => 'c',
        'ĉ' => 'c',
        'ċ' => 'c',
        'č' => 'c',
        'Ç' => 'C',
        'Ć' => 'C',
        'Ĉ' => 'C',
        'Ċ' => 'C',
        'Č' => 'C',

        'ď' => 'd',
        'đ' => 'd',
        'Ð' => 'D',
        'Ď' => 'D',
        'Đ' => 'D',


        'è' => 'e',
        'é' => 'e',
        'ê' => 'e',
        'ë' => 'e',
        'ę' => 'e',
        'ē' => 'e',
        'ĕ' => 'e',
        'ė' => 'e',
        'ě' => 'e',
        'È' => 'E',
        'É' => 'E',
        'Ê' => 'E',
        'Ë' => 'E',
        'Ę' => 'E',
        'Ē' => 'E',
        'Ĕ' => 'E',
        'Ė' => 'E',
        'Ě' => 'E',

        'ƒ' => 'f',


        'ĝ' => 'g',
        'ğ' => 'g',
        'ġ' => 'g',
        'ģ' => 'g',
        'Ĝ' => 'G',
        'Ğ' => 'G',
        'Ġ' => 'G',
        'Ģ' => 'G',


        'ĥ' => 'h',
        'ħ' => 'h',
        'Ĥ' => 'H',
        'Ħ' => 'H',

        'ì' => 'i',
        'í' => 'i',
        'î' => 'i',
        'ï' => 'i',
        'ĩ' => 'i',
        'ī' => 'i',
        'ĭ' => 'i',
        'į' => 'i',
        'ſ' => 'i',
        'ǐ' => 'i',
        'Ì' => 'I',
        'Í' => 'I',
        'Î' => 'I',
        'Ï' => 'I',
        'Ĩ' => 'I',
        'Ī' => 'I',
        'Ĭ' => 'I',
        'Į' => 'I',
        'İ' => 'I',
        'Ǐ' => 'I',

        'ĵ' => 'j',
        'Ĵ' => 'J',

        'ķ' => 'k',
        'Ķ' => 'K',


        'ł' => 'l',
        'ĺ' => 'l',
        'ļ' => 'l',
        'ľ' => 'l',
        'ŀ' => 'l',
        'Ł' => 'L',
        'Ĺ' => 'L',
        'Ļ' => 'L',
        'Ľ' => 'L',
        'Ŀ' => 'L',


        'ñ' => 'n',
        'ń' => 'n',
        'ņ' => 'n',
        'ň' => 'n',
        'ʼn' => 'n',
        'Ñ' => 'N',
        'Ń' => 'N',
        'Ņ' => 'N',
        'Ň' => 'N',

        'ò' => 'o',
        'ó' => 'o',
        'ô' => 'o',
        'õ' => 'o',
        'ö' => 'o',
        'ð' => 'o',
        'ø' => 'o',
        'ō' => 'o',
        'ŏ' => 'o',
        'ő' => 'o',
        'ơ' => 'o',
        'ǒ' => 'o',
        'ǿ' => 'o',
        'Ò' => 'O',
        'Ó' => 'O',
        'Ô' => 'O',
        'Õ' => 'O',
        'Ö' => 'O',
        'Ø' => 'O',
        'Ō' => 'O',
        'Ŏ' => 'O',
        'Ő' => 'O',
        'Ơ' => 'O',
        'Ǒ' => 'O',
        'Ǿ' => 'O',


        'ŕ' => 'r',
        'ŗ' => 'r',
        'ř' => 'r',
        'Ŕ' => 'R',
        'Ŗ' => 'R',
        'Ř' => 'R',


        'ś' => 's',
        'š' => 's',
        'ŝ' => 's',
        'ş' => 's',
        'Ś' => 'S',
        'Š' => 'S',
        'Ŝ' => 'S',
        'Ş' => 'S',

        'ţ' => 't',
        'ť' => 't',
        'ŧ' => 't',
        'Ţ' => 'T',
        'Ť' => 'T',
        'Ŧ' => 'T',


        'ù' => 'u',
        'ú' => 'u',
        'û' => 'u',
        'ü' => 'u',
        'ũ' => 'u',
        'ū' => 'u',
        'ŭ' => 'u',
        'ů' => 'u',
        'ű' => 'u',
        'ų' => 'u',
        'ư' => 'u',
        'ǔ' => 'u',
        'ǖ' => 'u',
        'ǘ' => 'u',
        'ǚ' => 'u',
        'ǜ' => 'u',
        'Ù' => 'U',
        'Ú' => 'U',
        'Û' => 'U',
        'Ü' => 'U',
        'Ũ' => 'U',
        'Ū' => 'U',
        'Ŭ' => 'U',
        'Ů' => 'U',
        'Ű' => 'U',
        'Ų' => 'U',
        'Ư' => 'U',
        'Ǔ' => 'U',
        'Ǖ' => 'U',
        'Ǘ' => 'U',
        'Ǚ' => 'U',
        'Ǜ' => 'U',


        'ŵ' => 'w',
        'Ŵ' => 'W',

        'ý' => 'y',
        'ÿ' => 'y',
        'ŷ' => 'y',
        'Ý' => 'Y',
        'Ÿ' => 'Y',
        'Ŷ' => 'Y',

        'ż' => 'z',
        'ź' => 'z',
        'ž' => 'z',
        'Ż' => 'Z',
        'Ź' => 'Z',
        'Ž' => 'Z',


        // accentuated ligatures
        'Ǽ' => 'A',
        'ǽ' => 'a',
    ];
    return strtr($str, $map);
}
Community
  • 1
  • 1
ling
  • 9,545
  • 4
  • 52
  • 49
4

What's wrong with this one? Works with UTF8

function strip_accents($s){
  return str_replace(
    explode(' ', preg_replace('/ +/', ' ', 'č ć ž š đ  Č Ć Ž Š Đ  à á â ã ä ç è é ê ë ì í î ï ñ ò ó ô õ ö ù ú û ü ý ÿ À Á Â Ã Ä Ç È É Ê Ë Ì Í Î Ï Ñ Ò Ó Ô Õ Ö Ù Ú Û Ü Ý')),
    explode(' ', preg_replace('/ +/', ' ', 'c c z s dj C C Z S DJ a a a a a c e e e e i i i i n o o o o o u u u u y y A A A A A C E E E E I I I I N O O O O O U U U U Y')),
    $s);
}

It can be faster by not using preg_replace, but speed was not my goal here.

Vladan
  • 725
  • 8
  • 13
2

I agree with georgebrock's comment.

If you find a way to get //TRANSLIT to work, you can build friendly URLs:

  1. use iconv with //TRANSLIT ñ => n~
    • remove non-alphanumeric non-whitespace chars inside words: $url = preg_replace( '/(\w)[^\w\s](\w)/', '$1$2', $url );
    • replace remaining separations: $url = preg_replace( '/[^a-z0-9]+/', '-', $url );
    • remove double/leading/traling: $url = preg_replace( '-', e.g. '/(?:(^|\-)\-+|\-$)/', '', $url );

If you can't get it to work, replace setp 1 with strtr/character-based replacement, like Xetius' solution.

instanceof me
  • 38,520
  • 3
  • 31
  • 40
2

I can't reproduce your problem. I get the expected result.

How exactly are you using mb_detect_encoding() to verify your string is in fact UTF-8?

If I simply call mb_detect_encoding($input) on both a UTF-8 and ISO-8859-1 encoded version of your string, both of them return "UTF-8", so that function isn't particularly reliable.

iconv() gives me a PHP "notice" when it gets the wrongly encoded string and only echoes "F", but that might just be because of different PHP/iconv settings/versions (?).

I suggest to you try calling mb_check_encoding($input, "utf-8") first to verify that your string really is UTF-8. I think it probably isn't.

mercator
  • 28,290
  • 8
  • 63
  • 72
  • Thanks for the tip. mb_check_encoding($input, "utf-8") is returning TRUE. Also, I was already using error_reporting(E_ALL) so there shouldn't be any errors slipping past me. – georgebrock Jun 19 '09 at 14:33
  • Hmmm, I see your point. I tried it on another machine now and that returns "Fo? Bar". What PHP and iconv versions are you using? – mercator Jun 19 '09 at 15:29
  • 2
    I think it is the iconv version that is at fault - this server is using the glibc version instead of the libiconv version. – georgebrock Jun 19 '09 at 16:41
  • Thanks mercator, you were really helpful. – georgebrock Jun 19 '09 at 16:48
  • Thanks for your explanation as well. I didn't realise it wasn't just version numbers. The difference on my end was also due to the different iconv implentations. – mercator Jun 20 '09 at 14:30
  • Fail with this character: `ǻ` and `iconv()` send warning... ilegal string –  May 03 '19 at 20:03
2

Merged Cazuma Nii Cavalcanti's implementation with Junior Mayhé's char list, hoping to save some time for some of you.

function stripAccents($str) {
    return strtr(utf8_decode($str), utf8_decode('ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝßàáâãäåæçèéêëìíîïñòóôõöøùúûüýÿĀāĂ㥹ĆćĈĉĊċČčĎďĐđĒēĔĕĖėĘęĚěĜĝĞğĠġĢģĤĥĦħĨĩĪīĬĭĮįİıIJijĴĵĶķĹĺĻļĽľĿŀŁłŃńŅņŇňʼnŌōŎŏŐőŒœŔŕŖŗŘřŚśŜŝŞşŠšŢţŤťŦŧŨũŪūŬŭŮůŰűŲųŴŵŶŷŸŹźŻżŽžſƒƠơƯưǍǎǏǐǑǒǓǔǕǖǗǘǙǚǛǜǺǻǼǽǾǿ'), 'AAAAAAAECEEEEIIIIDNOOOOOOUUUUYsaaaaaaaeceeeeiiiinoooooouuuuyyAaAaAaCcCcCcCcDdDdEeEeEeEeEeGgGgGgGgHhHhIiIiIiIiIiIJijJjKkLlLlLlLlllNnNnNnnOoOoOoOEoeRrRrRrSsSsSsSsTtTtTtUuUuUuUuUuUuWwYyYZzZzZzsfOoUuAaIiOoUuUuUuUuUuAaAEaeOo');
}
  • 1
    You are trying to utf8_decode non latin1 characters which will give you back the '?' character. Also your strings are different lengths because of the Ǽ to AE conversions. This method wont work. – Phil Jan 27 '15 at 23:35
2

In laravel you can simply use str_slug($accentedPhrase) and if you care about dash (-) that this method substitute with space you can use str_replace('-', ' ', str_slug($accentedPhrase))

aPa
  • 237
  • 3
  • 6
  • 3
    you don't need to use replace, you can set secont argument to blank string `str_slug($word, ' ');` – PayteR Jan 23 '19 at 06:57
2

Something like this?

$arrSearch  = explode(","," ,ç,æ, œ, á,é,í,ó,ú,à,è,ì,ò,ù,ä,ë,ï,ö,ü,ÿ,â,ê,î,ô,û,å,e,i,ø,u");

$arrReplace = explode(",","_,c,ae,oe,a,e,i,o,u,a,e,i,o,u,a,e,i,o,u,y,a,e,i,o,u,a,e,i,o,u");

$output = str_replace($arrSearch, $arrReplace, $input);
lluisma
  • 101
  • 1
  • 6
2

If the main task is just to use the string in a URL, why not to use slugyfier?

composer require cocur/slugify

then

use Cocur\Slugify\Slugify;

$slugify = new Slugify();
echo $slugify->slugify('Fóø Bår');

It also has many bridges for popular frameworks. E.g. you can use Doctrine Extensions Sluggable behaviour to generate automatically unique slug for each entity in DB and use it in URL.

If you want just to wipe out all accents you can play around with rulesets to satisfy the requirements.

2
function Unaccent($string){
    return preg_replace('~&([a-z]{1,2}) (acute|cedil|circ|grave|lig|orn|ring|slash|th|tilde|uml);~i', '$1', htmlentities($string, ENT_QUOTES, 'UTF-8'));
}
  • 1
    As it’s currently written, your answer is unclear. Please [edit] to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Mar 24 '22 at 06:52
  • Reference: https://gist.github.com/evaisse/169594?permalink_comment_id=4048789#gistcomment-4048789 – Fabrizio Valencia Aug 02 '22 at 18:44
1

This answer I've got following tips here, so it is not really mine. It works for me using LATIN1 or UTF-8. If you use other charsets, you probably should add them to mb_detect_encoding function. Correct environment set is probably needed also.

function NoAccents($s){
        return iconv(mb_detect_encoding($s,'UTF-8, ASCII, ISO-8859-1'),'ASCII//TRANSLIT//INGORE',$s);
}
helviojr
  • 21
  • 1
  • 1
    For `Fóø Bår`, I actually only got `Fo? Bar`. Could not have `ø` character translated to `o`. Tried changing my environment to no_NO, da_DK, but it did not interfere. Using `setlocale(LC_CTYPE,'da_DK')` I've got `Fo? Baar`. – helviojr Aug 08 '20 at 00:23
1

Old topics but add what I found working very weel for me. You can customise the transliterator for your needs.

 $transliterator = Transliterator::createFromRules(':: Any-Latin; :: Latin-ASCII; :: NFD; :: [:Nonspacing Mark:] Remove; :: Upper(); :: NFC;', Transliterator::FORWARD);

return $transliterator->transliterate('çÇæ λώπηξ-- É&é-è_çà=@46/,*')

Output: "CCAE LOPEX-- E&E-E_CA=@46/,*";

Doc: https://www.php.net/manual/en/class.transliterator.php

Michaël
  • 141
  • 1
  • 7
1

All of these are wrong. https://stackoverflow.com/a/35177899/308851 gets close but it gets Latin involved and gives no source for the rule either.

Let's check the standard... the library provided by the Unicode Consortium is ICU and the documentation has this to say:

For example, to remove accents from characters, use the following transform:

NFD; [:Nonspacing Mark:] Remove; NFC.

This transform separates accents from their base characters, removes the accents, and then puts the remaining text into an unaccented form.

That's it. No need for anything else.

Thus, if you have intl installed then you can do

$transliterator = Transliterator::createFromRules(':: NFD; :: [:Mn:] Remove; :: NFC;');
echo $transliterator->transliterate($string);

That's it. That's the answer to the question.

If you need to do this somewhere where intl is not available, you can snapshot what it does on a machine which does have intl:

<?php

$transliterator = \Transliterator::createFromRules(':: NFD; :: [:Mn:] Remove; :: NFC;');
$letters = preg_grep('/\pL/u', array_map('utf8', range(0x80, 0x2000)));
$letters = array_combine($letters, $letters);
$transliterated = array_map([$transliterator, 'transliterate'], $letters);
$map = array_diff_assoc($transliterated, $letters);
print count($map);
$search  = [' => ', 'array (', ')', '  ', "\n"];
$replace = ['=>', '[', ']', '', ''];
$map = str_replace($search, $replace, var_export($map, TRUE));
file_put_contents("map.php", "<?php\n\$map = $map;");
function utf8($num)
{
  if($num<=0x7F)       return chr($num);
  if($num<=0x7FF)      return chr(($num>>6)+192).chr(($num&63)+128);
  if($num<=0xFFFF)     return chr(($num>>12)+224).chr((($num>>6)&63)+128).chr(($num&63)+128);
  if($num<=0x1FFFFF)   return chr(($num>>18)+240).chr((($num>>12)&63)+128).chr((($num>>6)&63)+128).chr(($num&63)+128);
  return '';
}
?>

Here I have snapshotted the first 8192 characters which includes most non-Hiragana/Katakana scripts. The results are 9146 bytes long (Hiragana/Katakana more than doubles that and I didn't need it) encoding 827 replacements. You can put in your codebase and use it like this:

  function removeAccents($string) {
    include 'map.php';
    return strtr($string, $map);
  }

(this is demo, rather inline map.php into a class constant etc)

This is similar to https://stackoverflow.com/a/29280197/308851 but it includes a much larger number of characters -- and shows you how to obtain this mapping.

Ps.: Transliterating to ASCII is an entirely different bag of hurt. This is answer to "How do I remove accents from characters in a PHP string?"

Pps.: Here's the snapshot:

$map = ['À'=>'A','Á'=>'A','Â'=>'A','Ã'=>'A','Ä'=>'A','Å'=>'A','Ç'=>'C','È'=>'E','É'=>'E','Ê'=>'E','Ë'=>'E','Ì'=>'I','Í'=>'I','Î'=>'I','Ï'=>'I','Ñ'=>'N','Ò'=>'O','Ó'=>'O','Ô'=>'O','Õ'=>'O','Ö'=>'O','Ù'=>'U','Ú'=>'U','Û'=>'U','Ü'=>'U','Ý'=>'Y','à'=>'a','á'=>'a','â'=>'a','ã'=>'a','ä'=>'a','å'=>'a','ç'=>'c','è'=>'e','é'=>'e','ê'=>'e','ë'=>'e','ì'=>'i','í'=>'i','î'=>'i','ï'=>'i','ñ'=>'n','ò'=>'o','ó'=>'o','ô'=>'o','õ'=>'o','ö'=>'o','ù'=>'u','ú'=>'u','û'=>'u','ü'=>'u','ý'=>'y','ÿ'=>'y','Ā'=>'A','ā'=>'a','Ă'=>'A','ă'=>'a','Ą'=>'A','ą'=>'a','Ć'=>'C','ć'=>'c','Ĉ'=>'C','ĉ'=>'c','Ċ'=>'C','ċ'=>'c','Č'=>'C','č'=>'c','Ď'=>'D','ď'=>'d','Ē'=>'E','ē'=>'e','Ĕ'=>'E','ĕ'=>'e','Ė'=>'E','ė'=>'e','Ę'=>'E','ę'=>'e','Ě'=>'E','ě'=>'e','Ĝ'=>'G','ĝ'=>'g','Ğ'=>'G','ğ'=>'g','Ġ'=>'G','ġ'=>'g','Ģ'=>'G','ģ'=>'g','Ĥ'=>'H','ĥ'=>'h','Ĩ'=>'I','ĩ'=>'i','Ī'=>'I','ī'=>'i','Ĭ'=>'I','ĭ'=>'i','Į'=>'I','į'=>'i','İ'=>'I','Ĵ'=>'J','ĵ'=>'j','Ķ'=>'K','ķ'=>'k','Ĺ'=>'L','ĺ'=>'l','Ļ'=>'L','ļ'=>'l','Ľ'=>'L','ľ'=>'l','Ń'=>'N','ń'=>'n','Ņ'=>'N','ņ'=>'n','Ň'=>'N','ň'=>'n','Ō'=>'O','ō'=>'o','Ŏ'=>'O','ŏ'=>'o','Ő'=>'O','ő'=>'o','Ŕ'=>'R','ŕ'=>'r','Ŗ'=>'R','ŗ'=>'r','Ř'=>'R','ř'=>'r','Ś'=>'S','ś'=>'s','Ŝ'=>'S','ŝ'=>'s','Ş'=>'S','ş'=>'s','Š'=>'S','š'=>'s','Ţ'=>'T','ţ'=>'t','Ť'=>'T','ť'=>'t','Ũ'=>'U','ũ'=>'u','Ū'=>'U','ū'=>'u','Ŭ'=>'U','ŭ'=>'u','Ů'=>'U','ů'=>'u','Ű'=>'U','ű'=>'u','Ų'=>'U','ų'=>'u','Ŵ'=>'W','ŵ'=>'w','Ŷ'=>'Y','ŷ'=>'y','Ÿ'=>'Y','Ź'=>'Z','ź'=>'z','Ż'=>'Z','ż'=>'z','Ž'=>'Z','ž'=>'z','Ơ'=>'O','ơ'=>'o','Ư'=>'U','ư'=>'u','Ǎ'=>'A','ǎ'=>'a','Ǐ'=>'I','ǐ'=>'i','Ǒ'=>'O','ǒ'=>'o','Ǔ'=>'U','ǔ'=>'u','Ǖ'=>'U','ǖ'=>'u','Ǘ'=>'U','ǘ'=>'u','Ǚ'=>'U','ǚ'=>'u','Ǜ'=>'U','ǜ'=>'u','Ǟ'=>'A','ǟ'=>'a','Ǡ'=>'A','ǡ'=>'a','Ǣ'=>'Æ','ǣ'=>'æ','Ǧ'=>'G','ǧ'=>'g','Ǩ'=>'K','ǩ'=>'k','Ǫ'=>'O','ǫ'=>'o','Ǭ'=>'O','ǭ'=>'o','Ǯ'=>'Ʒ','ǯ'=>'ʒ','ǰ'=>'j','Ǵ'=>'G','ǵ'=>'g','Ǹ'=>'N','ǹ'=>'n','Ǻ'=>'A','ǻ'=>'a','Ǽ'=>'Æ','ǽ'=>'æ','Ǿ'=>'Ø','ǿ'=>'ø','Ȁ'=>'A','ȁ'=>'a','Ȃ'=>'A','ȃ'=>'a','Ȅ'=>'E','ȅ'=>'e','Ȇ'=>'E','ȇ'=>'e','Ȉ'=>'I','ȉ'=>'i','Ȋ'=>'I','ȋ'=>'i','Ȍ'=>'O','ȍ'=>'o','Ȏ'=>'O','ȏ'=>'o','Ȑ'=>'R','ȑ'=>'r','Ȓ'=>'R','ȓ'=>'r','Ȕ'=>'U','ȕ'=>'u','Ȗ'=>'U','ȗ'=>'u','Ș'=>'S','ș'=>'s','Ț'=>'T','ț'=>'t','Ȟ'=>'H','ȟ'=>'h','Ȧ'=>'A','ȧ'=>'a','Ȩ'=>'E','ȩ'=>'e','Ȫ'=>'O','ȫ'=>'o','Ȭ'=>'O','ȭ'=>'o','Ȯ'=>'O','ȯ'=>'o','Ȱ'=>'O','ȱ'=>'o','Ȳ'=>'Y','ȳ'=>'y','ʹ'=>'ʹ','Ά'=>'Α','Έ'=>'Ε','Ή'=>'Η','Ί'=>'Ι','Ό'=>'Ο','Ύ'=>'Υ','Ώ'=>'Ω','ΐ'=>'ι','Ϊ'=>'Ι','Ϋ'=>'Υ','ά'=>'α','έ'=>'ε','ή'=>'η','ί'=>'ι','ΰ'=>'υ','ϊ'=>'ι','ϋ'=>'υ','ό'=>'ο','ύ'=>'υ','ώ'=>'ω','ϓ'=>'ϒ','ϔ'=>'ϒ','Ѐ'=>'Е','Ё'=>'Е','Ѓ'=>'Г','Ї'=>'І','Ќ'=>'К','Ѝ'=>'И','Ў'=>'У','Й'=>'И','й'=>'и','ѐ'=>'е','ё'=>'е','ѓ'=>'г','ї'=>'і','ќ'=>'к','ѝ'=>'и','ў'=>'у','Ѷ'=>'Ѵ','ѷ'=>'ѵ','Ӂ'=>'Ж','ӂ'=>'ж','Ӑ'=>'А','ӑ'=>'а','Ӓ'=>'А','ӓ'=>'а','Ӗ'=>'Е','ӗ'=>'е','Ӛ'=>'Ә','ӛ'=>'ә','Ӝ'=>'Ж','ӝ'=>'ж','Ӟ'=>'З','ӟ'=>'з','Ӣ'=>'И','ӣ'=>'и','Ӥ'=>'И','ӥ'=>'и','Ӧ'=>'О','ӧ'=>'о','Ӫ'=>'Ө','ӫ'=>'ө','Ӭ'=>'Э','ӭ'=>'э','Ӯ'=>'У','ӯ'=>'у','Ӱ'=>'У','ӱ'=>'у','Ӳ'=>'У','ӳ'=>'у','Ӵ'=>'Ч','ӵ'=>'ч','Ӹ'=>'Ы','ӹ'=>'ы','آ'=>'ا','أ'=>'ا','ؤ'=>'و','إ'=>'ا','ئ'=>'ي','ۀ'=>'ە','ۂ'=>'ہ','ۓ'=>'ے','ऩ'=>'न','ऱ'=>'र','ऴ'=>'ळ','क़'=>'क','ख़'=>'ख','ग़'=>'ग','ज़'=>'ज','ड़'=>'ड','ढ़'=>'ढ','फ़'=>'फ','य़'=>'य','ড়'=>'ড','ঢ়'=>'ঢ','য়'=>'য','ਲ਼'=>'ਲ','ਸ਼'=>'ਸ','ਖ਼'=>'ਖ','ਗ਼'=>'ਗ','ਜ਼'=>'ਜ','ਫ਼'=>'ਫ','ଡ଼'=>'ଡ','ଢ଼'=>'ଢ','གྷ'=>'ག','ཌྷ'=>'ཌ','དྷ'=>'ད','བྷ'=>'བ','ཛྷ'=>'ཛ','ཀྵ'=>'ཀ','ဦ'=>'ဥ','Ḁ'=>'A','ḁ'=>'a','Ḃ'=>'B','ḃ'=>'b','Ḅ'=>'B','ḅ'=>'b','Ḇ'=>'B','ḇ'=>'b','Ḉ'=>'C','ḉ'=>'c','Ḋ'=>'D','ḋ'=>'d','Ḍ'=>'D','ḍ'=>'d','Ḏ'=>'D','ḏ'=>'d','Ḑ'=>'D','ḑ'=>'d','Ḓ'=>'D','ḓ'=>'d','Ḕ'=>'E','ḕ'=>'e','Ḗ'=>'E','ḗ'=>'e','Ḙ'=>'E','ḙ'=>'e','Ḛ'=>'E','ḛ'=>'e','Ḝ'=>'E','ḝ'=>'e','Ḟ'=>'F','ḟ'=>'f','Ḡ'=>'G','ḡ'=>'g','Ḣ'=>'H','ḣ'=>'h','Ḥ'=>'H','ḥ'=>'h','Ḧ'=>'H','ḧ'=>'h','Ḩ'=>'H','ḩ'=>'h','Ḫ'=>'H','ḫ'=>'h','Ḭ'=>'I','ḭ'=>'i','Ḯ'=>'I','ḯ'=>'i','Ḱ'=>'K','ḱ'=>'k','Ḳ'=>'K','ḳ'=>'k','Ḵ'=>'K','ḵ'=>'k','Ḷ'=>'L','ḷ'=>'l','Ḹ'=>'L','ḹ'=>'l','Ḻ'=>'L','ḻ'=>'l','Ḽ'=>'L','ḽ'=>'l','Ḿ'=>'M','ḿ'=>'m','Ṁ'=>'M','ṁ'=>'m','Ṃ'=>'M','ṃ'=>'m','Ṅ'=>'N','ṅ'=>'n','Ṇ'=>'N','ṇ'=>'n','Ṉ'=>'N','ṉ'=>'n','Ṋ'=>'N','ṋ'=>'n','Ṍ'=>'O','ṍ'=>'o','Ṏ'=>'O','ṏ'=>'o','Ṑ'=>'O','ṑ'=>'o','Ṓ'=>'O','ṓ'=>'o','Ṕ'=>'P','ṕ'=>'p','Ṗ'=>'P','ṗ'=>'p','Ṙ'=>'R','ṙ'=>'r','Ṛ'=>'R','ṛ'=>'r','Ṝ'=>'R','ṝ'=>'r','Ṟ'=>'R','ṟ'=>'r','Ṡ'=>'S','ṡ'=>'s','Ṣ'=>'S','ṣ'=>'s','Ṥ'=>'S','ṥ'=>'s','Ṧ'=>'S','ṧ'=>'s','Ṩ'=>'S','ṩ'=>'s','Ṫ'=>'T','ṫ'=>'t','Ṭ'=>'T','ṭ'=>'t','Ṯ'=>'T','ṯ'=>'t','Ṱ'=>'T','ṱ'=>'t','Ṳ'=>'U','ṳ'=>'u','Ṵ'=>'U','ṵ'=>'u','Ṷ'=>'U','ṷ'=>'u','Ṹ'=>'U','ṹ'=>'u','Ṻ'=>'U','ṻ'=>'u','Ṽ'=>'V','ṽ'=>'v','Ṿ'=>'V','ṿ'=>'v','Ẁ'=>'W','ẁ'=>'w','Ẃ'=>'W','ẃ'=>'w','Ẅ'=>'W','ẅ'=>'w','Ẇ'=>'W','ẇ'=>'w','Ẉ'=>'W','ẉ'=>'w','Ẋ'=>'X','ẋ'=>'x','Ẍ'=>'X','ẍ'=>'x','Ẏ'=>'Y','ẏ'=>'y','Ẑ'=>'Z','ẑ'=>'z','Ẓ'=>'Z','ẓ'=>'z','Ẕ'=>'Z','ẕ'=>'z','ẖ'=>'h','ẗ'=>'t','ẘ'=>'w','ẙ'=>'y','ẛ'=>'ſ','Ạ'=>'A','ạ'=>'a','Ả'=>'A','ả'=>'a','Ấ'=>'A','ấ'=>'a','Ầ'=>'A','ầ'=>'a','Ẩ'=>'A','ẩ'=>'a','Ẫ'=>'A','ẫ'=>'a','Ậ'=>'A','ậ'=>'a','Ắ'=>'A','ắ'=>'a','Ằ'=>'A','ằ'=>'a','Ẳ'=>'A','ẳ'=>'a','Ẵ'=>'A','ẵ'=>'a','Ặ'=>'A','ặ'=>'a','Ẹ'=>'E','ẹ'=>'e','Ẻ'=>'E','ẻ'=>'e','Ẽ'=>'E','ẽ'=>'e','Ế'=>'E','ế'=>'e','Ề'=>'E','ề'=>'e','Ể'=>'E','ể'=>'e','Ễ'=>'E','ễ'=>'e','Ệ'=>'E','ệ'=>'e','Ỉ'=>'I','ỉ'=>'i','Ị'=>'I','ị'=>'i','Ọ'=>'O','ọ'=>'o','Ỏ'=>'O','ỏ'=>'o','Ố'=>'O','ố'=>'o','Ồ'=>'O','ồ'=>'o','Ổ'=>'O','ổ'=>'o','Ỗ'=>'O','ỗ'=>'o','Ộ'=>'O','ộ'=>'o','Ớ'=>'O','ớ'=>'o','Ờ'=>'O','ờ'=>'o','Ở'=>'O','ở'=>'o','Ỡ'=>'O','ỡ'=>'o','Ợ'=>'O','ợ'=>'o','Ụ'=>'U','ụ'=>'u','Ủ'=>'U','ủ'=>'u','Ứ'=>'U','ứ'=>'u','Ừ'=>'U','ừ'=>'u','Ử'=>'U','ử'=>'u','Ữ'=>'U','ữ'=>'u','Ự'=>'U','ự'=>'u','Ỳ'=>'Y','ỳ'=>'y','Ỵ'=>'Y','ỵ'=>'y','Ỷ'=>'Y','ỷ'=>'y','Ỹ'=>'Y','ỹ'=>'y','ἀ'=>'α','ἁ'=>'α','ἂ'=>'α','ἃ'=>'α','ἄ'=>'α','ἅ'=>'α','ἆ'=>'α','ἇ'=>'α','Ἀ'=>'Α','Ἁ'=>'Α','Ἂ'=>'Α','Ἃ'=>'Α','Ἄ'=>'Α','Ἅ'=>'Α','Ἆ'=>'Α','Ἇ'=>'Α','ἐ'=>'ε','ἑ'=>'ε','ἒ'=>'ε','ἓ'=>'ε','ἔ'=>'ε','ἕ'=>'ε','Ἐ'=>'Ε','Ἑ'=>'Ε','Ἒ'=>'Ε','Ἓ'=>'Ε','Ἔ'=>'Ε','Ἕ'=>'Ε','ἠ'=>'η','ἡ'=>'η','ἢ'=>'η','ἣ'=>'η','ἤ'=>'η','ἥ'=>'η','ἦ'=>'η','ἧ'=>'η','Ἠ'=>'Η','Ἡ'=>'Η','Ἢ'=>'Η','Ἣ'=>'Η','Ἤ'=>'Η','Ἥ'=>'Η','Ἦ'=>'Η','Ἧ'=>'Η','ἰ'=>'ι','ἱ'=>'ι','ἲ'=>'ι','ἳ'=>'ι','ἴ'=>'ι','ἵ'=>'ι','ἶ'=>'ι','ἷ'=>'ι','Ἰ'=>'Ι','Ἱ'=>'Ι','Ἲ'=>'Ι','Ἳ'=>'Ι','Ἴ'=>'Ι','Ἵ'=>'Ι','Ἶ'=>'Ι','Ἷ'=>'Ι','ὀ'=>'ο','ὁ'=>'ο','ὂ'=>'ο','ὃ'=>'ο','ὄ'=>'ο','ὅ'=>'ο','Ὀ'=>'Ο','Ὁ'=>'Ο','Ὂ'=>'Ο','Ὃ'=>'Ο','Ὄ'=>'Ο','Ὅ'=>'Ο','ὐ'=>'υ','ὑ'=>'υ','ὒ'=>'υ','ὓ'=>'υ','ὔ'=>'υ','ὕ'=>'υ','ὖ'=>'υ','ὗ'=>'υ','Ὑ'=>'Υ','Ὓ'=>'Υ','Ὕ'=>'Υ','Ὗ'=>'Υ','ὠ'=>'ω','ὡ'=>'ω','ὢ'=>'ω','ὣ'=>'ω','ὤ'=>'ω','ὥ'=>'ω','ὦ'=>'ω','ὧ'=>'ω','Ὠ'=>'Ω','Ὡ'=>'Ω','Ὢ'=>'Ω','Ὣ'=>'Ω','Ὤ'=>'Ω','Ὥ'=>'Ω','Ὦ'=>'Ω','Ὧ'=>'Ω','ὰ'=>'α','ά'=>'α','ὲ'=>'ε','έ'=>'ε','ὴ'=>'η','ή'=>'η','ὶ'=>'ι','ί'=>'ι','ὸ'=>'ο','ό'=>'ο','ὺ'=>'υ','ύ'=>'υ','ὼ'=>'ω','ώ'=>'ω','ᾀ'=>'α','ᾁ'=>'α','ᾂ'=>'α','ᾃ'=>'α','ᾄ'=>'α','ᾅ'=>'α','ᾆ'=>'α','ᾇ'=>'α','ᾈ'=>'Α','ᾉ'=>'Α','ᾊ'=>'Α','ᾋ'=>'Α','ᾌ'=>'Α','ᾍ'=>'Α','ᾎ'=>'Α','ᾏ'=>'Α','ᾐ'=>'η','ᾑ'=>'η','ᾒ'=>'η','ᾓ'=>'η','ᾔ'=>'η','ᾕ'=>'η','ᾖ'=>'η','ᾗ'=>'η','ᾘ'=>'Η','ᾙ'=>'Η','ᾚ'=>'Η','ᾛ'=>'Η','ᾜ'=>'Η','ᾝ'=>'Η','ᾞ'=>'Η','ᾟ'=>'Η','ᾠ'=>'ω','ᾡ'=>'ω','ᾢ'=>'ω','ᾣ'=>'ω','ᾤ'=>'ω','ᾥ'=>'ω','ᾦ'=>'ω','ᾧ'=>'ω','ᾨ'=>'Ω','ᾩ'=>'Ω','ᾪ'=>'Ω','ᾫ'=>'Ω','ᾬ'=>'Ω','ᾭ'=>'Ω','ᾮ'=>'Ω','ᾯ'=>'Ω','ᾰ'=>'α','ᾱ'=>'α','ᾲ'=>'α','ᾳ'=>'α','ᾴ'=>'α','ᾶ'=>'α','ᾷ'=>'α','Ᾰ'=>'Α','Ᾱ'=>'Α','Ὰ'=>'Α','Ά'=>'Α','ᾼ'=>'Α','ι'=>'ι','ῂ'=>'η','ῃ'=>'η','ῄ'=>'η','ῆ'=>'η','ῇ'=>'η','Ὲ'=>'Ε','Έ'=>'Ε','Ὴ'=>'Η','Ή'=>'Η','ῌ'=>'Η','ῐ'=>'ι','ῑ'=>'ι','ῒ'=>'ι','ΐ'=>'ι','ῖ'=>'ι','ῗ'=>'ι','Ῐ'=>'Ι','Ῑ'=>'Ι','Ὶ'=>'Ι','Ί'=>'Ι','ῠ'=>'υ','ῡ'=>'υ','ῢ'=>'υ','ΰ'=>'υ','ῤ'=>'ρ','ῥ'=>'ρ','ῦ'=>'υ','ῧ'=>'υ','Ῠ'=>'Υ','Ῡ'=>'Υ','Ὺ'=>'Υ','Ύ'=>'Υ','Ῥ'=>'Ρ','ῲ'=>'ω','ῳ'=>'ω','ῴ'=>'ω','ῶ'=>'ω','ῷ'=>'ω','Ὸ'=>'Ο','Ό'=>'Ο','Ὼ'=>'Ω','Ώ'=>'Ω','ῼ'=>'Ω'];
chx
  • 11,270
  • 7
  • 55
  • 129
0

You can use an array key => value style to use with strtr() safely for UTF-8 characters even if they are multi-bytes.

function no_accent($str){
    $accents = array('À' => 'A', 'Á' => 'A', 'Â' => 'A', 'Ã' => 'A', 'Ä' => 'A', 'Å' => 'A', 'à' => 'a', 'á' => 'a', 'â' => 'a', 'ã' => 'a', 'ä' => 'a', 'å' => 'a', 'Ā' => 'A', 'ā' => 'a', 'Ă' => 'A', 'ă' => 'a', 'Ą' => 'A', 'ą' => 'a', 'Ç' => 'C', 'ç' => 'c', 'Ć' => 'C', 'ć' => 'c', 'Ĉ' => 'C', 'ĉ' => 'c', 'Ċ' => 'C', 'ċ' => 'c', 'Č' => 'C', 'č' => 'c', 'Ð' => 'D', 'ð' => 'd', 'Ď' => 'D', 'ď' => 'd', 'Đ' => 'D', 'đ' => 'd', 'È' => 'E', 'É' => 'E', 'Ê' => 'E', 'Ë' => 'E', 'è' => 'e', 'é' => 'e', 'ê' => 'e', 'ë' => 'e', 'Ē' => 'E', 'ē' => 'e', 'Ĕ' => 'E', 'ĕ' => 'e', 'Ė' => 'E', 'ė' => 'e', 'Ę' => 'E', 'ę' => 'e', 'Ě' => 'E', 'ě' => 'e', 'Ĝ' => 'G', 'ĝ' => 'g', 'Ğ' => 'G', 'ğ' => 'g', 'Ġ' => 'G', 'ġ' => 'g', 'Ģ' => 'G', 'ģ' => 'g', 'Ĥ' => 'H', 'ĥ' => 'h', 'Ħ' => 'H', 'ħ' => 'h', 'Ì' => 'I', 'Í' => 'I', 'Î' => 'I', 'Ï' => 'I', 'ì' => 'i', 'í' => 'i', 'î' => 'i', 'ï' => 'i', 'Ĩ' => 'I', 'ĩ' => 'i', 'Ī' => 'I', 'ī' => 'i', 'Ĭ' => 'I', 'ĭ' => 'i', 'Į' => 'I', 'į' => 'i', 'İ' => 'I', 'ı' => 'i', 'Ĵ' => 'J', 'ĵ' => 'j', 'Ķ' => 'K', 'ķ' => 'k', 'ĸ' => 'k', 'Ĺ' => 'L', 'ĺ' => 'l', 'Ļ' => 'L', 'ļ' => 'l', 'Ľ' => 'L', 'ľ' => 'l', 'Ŀ' => 'L', 'ŀ' => 'l', 'Ł' => 'L', 'ł' => 'l', 'Ñ' => 'N', 'ñ' => 'n', 'Ń' => 'N', 'ń' => 'n', 'Ņ' => 'N', 'ņ' => 'n', 'Ň' => 'N', 'ň' => 'n', 'ʼn' => 'n', 'Ŋ' => 'N', 'ŋ' => 'n', 'Ò' => 'O', 'Ó' => 'O', 'Ô' => 'O', 'Õ' => 'O', 'Ö' => 'O', 'Ø' => 'O', 'ò' => 'o', 'ó' => 'o', 'ô' => 'o', 'õ' => 'o', 'ö' => 'o', 'ø' => 'o', 'Ō' => 'O', 'ō' => 'o', 'Ŏ' => 'O', 'ŏ' => 'o', 'Ő' => 'O', 'ő' => 'o', 'Ŕ' => 'R', 'ŕ' => 'r', 'Ŗ' => 'R', 'ŗ' => 'r', 'Ř' => 'R', 'ř' => 'r', 'Ś' => 'S', 'ś' => 's', 'Ŝ' => 'S', 'ŝ' => 's', 'Ş' => 'S', 'ş' => 's', 'Š' => 'S', 'š' => 's', 'ſ' => 's', 'Ţ' => 'T', 'ţ' => 't', 'Ť' => 'T', 'ť' => 't', 'Ŧ' => 'T', 'ŧ' => 't', 'Ù' => 'U', 'Ú' => 'U', 'Û' => 'U', 'Ü' => 'U', 'ù' => 'u', 'ú' => 'u', 'û' => 'u', 'ü' => 'u', 'Ũ' => 'U', 'ũ' => 'u', 'Ū' => 'U', 'ū' => 'u', 'Ŭ' => 'U', 'ŭ' => 'u', 'Ů' => 'U', 'ů' => 'u', 'Ű' => 'U', 'ű' => 'u', 'Ų' => 'U', 'ų' => 'u', 'Ŵ' => 'W', 'ŵ' => 'w', 'Ý' => 'Y', 'ý' => 'y', 'ÿ' => 'y', 'Ŷ' => 'Y', 'ŷ' => 'y', 'Ÿ' => 'Y', 'Ź' => 'Z', 'ź' => 'z', 'Ż' => 'Z', 'ż' => 'z', 'Ž' => 'Z', 'ž' => 'z');
    return strtr($str, $accents);
}

Plus, you save decode/encode in UTF-8 part.

Capripot
  • 1,354
  • 16
  • 26
0

An improved version of remove_accents() function according to last version Wordpress 4.3 formatting is:

function mbstring_binary_safe_encoding( $reset = false ) {
    static $encodings = array();
    static $overloaded = null;

    if ( is_null( $overloaded ) )
        $overloaded = function_exists( 'mb_internal_encoding' ) && ( ini_get( 'mbstring.func_overload' ) & 2 );

    if ( false === $overloaded )
        return;

    if ( ! $reset ) {
        $encoding = mb_internal_encoding();
        array_push( $encodings, $encoding );
        mb_internal_encoding( 'ISO-8859-1' );
    }

    if ( $reset && $encodings ) {
        $encoding = array_pop( $encodings );
        mb_internal_encoding( $encoding );
    }
}

function reset_mbstring_encoding() {
    mbstring_binary_safe_encoding( true );
}

function seems_utf8( $str ) {
    mbstring_binary_safe_encoding();
    $length = strlen($str);
    reset_mbstring_encoding();
    for ($i=0; $i < $length; $i++) {
        $c = ord($str[$i]);
        if ($c < 0x80) $n = 0; // 0bbbbbbb
        elseif (($c & 0xE0) == 0xC0) $n=1; // 110bbbbb
        elseif (($c & 0xF0) == 0xE0) $n=2; // 1110bbbb
        elseif (($c & 0xF8) == 0xF0) $n=3; // 11110bbb
        elseif (($c & 0xFC) == 0xF8) $n=4; // 111110bb
        elseif (($c & 0xFE) == 0xFC) $n=5; // 1111110b
        else return false; // Does not match any model
        for ($j=0; $j<$n; $j++) { // n bytes matching 10bbbbbb follow ?
            if ((++$i == $length) || ((ord($str[$i]) & 0xC0) != 0x80))
                return false;
        }
    }
    return true;
}

function remove_accents( $string ) {
    if ( !preg_match('/[\x80-\xff]/', $string) )
        return $string;

    if (seems_utf8($string)) {
        $chars = array(
            // Decompositions for Latin-1 Supplement
            chr(194).chr(170) => 'a', chr(194).chr(186) => 'o',
            chr(195).chr(128) => 'A', chr(195).chr(129) => 'A',
            chr(195).chr(130) => 'A', chr(195).chr(131) => 'A',
            chr(195).chr(132) => 'A', chr(195).chr(133) => 'A',
            chr(195).chr(134) => 'AE',chr(195).chr(135) => 'C',
            chr(195).chr(136) => 'E', chr(195).chr(137) => 'E',
            chr(195).chr(138) => 'E', chr(195).chr(139) => 'E',
            chr(195).chr(140) => 'I', chr(195).chr(141) => 'I',
            chr(195).chr(142) => 'I', chr(195).chr(143) => 'I',
            chr(195).chr(144) => 'D', chr(195).chr(145) => 'N',
            chr(195).chr(146) => 'O', chr(195).chr(147) => 'O',
            chr(195).chr(148) => 'O', chr(195).chr(149) => 'O',
            chr(195).chr(150) => 'O', chr(195).chr(153) => 'U',
            chr(195).chr(154) => 'U', chr(195).chr(155) => 'U',
            chr(195).chr(156) => 'U', chr(195).chr(157) => 'Y',
            chr(195).chr(158) => 'TH',chr(195).chr(159) => 's',
            chr(195).chr(160) => 'a', chr(195).chr(161) => 'a',
            chr(195).chr(162) => 'a', chr(195).chr(163) => 'a',
            chr(195).chr(164) => 'a', chr(195).chr(165) => 'a',
            chr(195).chr(166) => 'ae',chr(195).chr(167) => 'c',
            chr(195).chr(168) => 'e', chr(195).chr(169) => 'e',
            chr(195).chr(170) => 'e', chr(195).chr(171) => 'e',
            chr(195).chr(172) => 'i', chr(195).chr(173) => 'i',
            chr(195).chr(174) => 'i', chr(195).chr(175) => 'i',
            chr(195).chr(176) => 'd', chr(195).chr(177) => 'n',
            chr(195).chr(178) => 'o', chr(195).chr(179) => 'o',
            chr(195).chr(180) => 'o', chr(195).chr(181) => 'o',
            chr(195).chr(182) => 'o', chr(195).chr(184) => 'o',
            chr(195).chr(185) => 'u', chr(195).chr(186) => 'u',
            chr(195).chr(187) => 'u', chr(195).chr(188) => 'u',
            chr(195).chr(189) => 'y', chr(195).chr(190) => 'th',
            chr(195).chr(191) => 'y', chr(195).chr(152) => 'O',
            // Decompositions for Latin Extended-A
            chr(196).chr(128) => 'A', chr(196).chr(129) => 'a',
            chr(196).chr(130) => 'A', chr(196).chr(131) => 'a',
            chr(196).chr(132) => 'A', chr(196).chr(133) => 'a',
            chr(196).chr(134) => 'C', chr(196).chr(135) => 'c',
            chr(196).chr(136) => 'C', chr(196).chr(137) => 'c',
            chr(196).chr(138) => 'C', chr(196).chr(139) => 'c',
            chr(196).chr(140) => 'C', chr(196).chr(141) => 'c',
            chr(196).chr(142) => 'D', chr(196).chr(143) => 'd',
            chr(196).chr(144) => 'D', chr(196).chr(145) => 'd',
            chr(196).chr(146) => 'E', chr(196).chr(147) => 'e',
            chr(196).chr(148) => 'E', chr(196).chr(149) => 'e',
            chr(196).chr(150) => 'E', chr(196).chr(151) => 'e',
            chr(196).chr(152) => 'E', chr(196).chr(153) => 'e',
            chr(196).chr(154) => 'E', chr(196).chr(155) => 'e',
            chr(196).chr(156) => 'G', chr(196).chr(157) => 'g',
            chr(196).chr(158) => 'G', chr(196).chr(159) => 'g',
            chr(196).chr(160) => 'G', chr(196).chr(161) => 'g',
            chr(196).chr(162) => 'G', chr(196).chr(163) => 'g',
            chr(196).chr(164) => 'H', chr(196).chr(165) => 'h',
            chr(196).chr(166) => 'H', chr(196).chr(167) => 'h',
            chr(196).chr(168) => 'I', chr(196).chr(169) => 'i',
            chr(196).chr(170) => 'I', chr(196).chr(171) => 'i',
            chr(196).chr(172) => 'I', chr(196).chr(173) => 'i',
            chr(196).chr(174) => 'I', chr(196).chr(175) => 'i',
            chr(196).chr(176) => 'I', chr(196).chr(177) => 'i',
            chr(196).chr(178) => 'IJ',chr(196).chr(179) => 'ij',
            chr(196).chr(180) => 'J', chr(196).chr(181) => 'j',
            chr(196).chr(182) => 'K', chr(196).chr(183) => 'k',
            chr(196).chr(184) => 'k', chr(196).chr(185) => 'L',
            chr(196).chr(186) => 'l', chr(196).chr(187) => 'L',
            chr(196).chr(188) => 'l', chr(196).chr(189) => 'L',
            chr(196).chr(190) => 'l', chr(196).chr(191) => 'L',
            chr(197).chr(128) => 'l', chr(197).chr(129) => 'L',
            chr(197).chr(130) => 'l', chr(197).chr(131) => 'N',
            chr(197).chr(132) => 'n', chr(197).chr(133) => 'N',
            chr(197).chr(134) => 'n', chr(197).chr(135) => 'N',
            chr(197).chr(136) => 'n', chr(197).chr(137) => 'N',
            chr(197).chr(138) => 'n', chr(197).chr(139) => 'N',
            chr(197).chr(140) => 'O', chr(197).chr(141) => 'o',
            chr(197).chr(142) => 'O', chr(197).chr(143) => 'o',
            chr(197).chr(144) => 'O', chr(197).chr(145) => 'o',
            chr(197).chr(146) => 'OE',chr(197).chr(147) => 'oe',
            chr(197).chr(148) => 'R',chr(197).chr(149) => 'r',
            chr(197).chr(150) => 'R',chr(197).chr(151) => 'r',
            chr(197).chr(152) => 'R',chr(197).chr(153) => 'r',
            chr(197).chr(154) => 'S',chr(197).chr(155) => 's',
            chr(197).chr(156) => 'S',chr(197).chr(157) => 's',
            chr(197).chr(158) => 'S',chr(197).chr(159) => 's',
            chr(197).chr(160) => 'S', chr(197).chr(161) => 's',
            chr(197).chr(162) => 'T', chr(197).chr(163) => 't',
            chr(197).chr(164) => 'T', chr(197).chr(165) => 't',
            chr(197).chr(166) => 'T', chr(197).chr(167) => 't',
            chr(197).chr(168) => 'U', chr(197).chr(169) => 'u',
            chr(197).chr(170) => 'U', chr(197).chr(171) => 'u',
            chr(197).chr(172) => 'U', chr(197).chr(173) => 'u',
            chr(197).chr(174) => 'U', chr(197).chr(175) => 'u',
            chr(197).chr(176) => 'U', chr(197).chr(177) => 'u',
            chr(197).chr(178) => 'U', chr(197).chr(179) => 'u',
            chr(197).chr(180) => 'W', chr(197).chr(181) => 'w',
            chr(197).chr(182) => 'Y', chr(197).chr(183) => 'y',
            chr(197).chr(184) => 'Y', chr(197).chr(185) => 'Z',
            chr(197).chr(186) => 'z', chr(197).chr(187) => 'Z',
            chr(197).chr(188) => 'z', chr(197).chr(189) => 'Z',
            chr(197).chr(190) => 'z', chr(197).chr(191) => 's',
            // Decompositions for Latin Extended-B
            chr(200).chr(152) => 'S', chr(200).chr(153) => 's',
            chr(200).chr(154) => 'T', chr(200).chr(155) => 't',
            // Euro Sign
            chr(226).chr(130).chr(172) => 'E',
            // GBP (Pound) Sign
            chr(194).chr(163) => '',
            // Vowels with diacritic (Vietnamese)
            // unmarked
            chr(198).chr(160) => 'O', chr(198).chr(161) => 'o',
            chr(198).chr(175) => 'U', chr(198).chr(176) => 'u',
            // grave accent
            chr(225).chr(186).chr(166) => 'A', chr(225).chr(186).chr(167) => 'a',
            chr(225).chr(186).chr(176) => 'A', chr(225).chr(186).chr(177) => 'a',
            chr(225).chr(187).chr(128) => 'E', chr(225).chr(187).chr(129) => 'e',
            chr(225).chr(187).chr(146) => 'O', chr(225).chr(187).chr(147) => 'o',
            chr(225).chr(187).chr(156) => 'O', chr(225).chr(187).chr(157) => 'o',
            chr(225).chr(187).chr(170) => 'U', chr(225).chr(187).chr(171) => 'u',
            chr(225).chr(187).chr(178) => 'Y', chr(225).chr(187).chr(179) => 'y',
            // hook
            chr(225).chr(186).chr(162) => 'A', chr(225).chr(186).chr(163) => 'a',
            chr(225).chr(186).chr(168) => 'A', chr(225).chr(186).chr(169) => 'a',
            chr(225).chr(186).chr(178) => 'A', chr(225).chr(186).chr(179) => 'a',
            chr(225).chr(186).chr(186) => 'E', chr(225).chr(186).chr(187) => 'e',
            chr(225).chr(187).chr(130) => 'E', chr(225).chr(187).chr(131) => 'e',
            chr(225).chr(187).chr(136) => 'I', chr(225).chr(187).chr(137) => 'i',
            chr(225).chr(187).chr(142) => 'O', chr(225).chr(187).chr(143) => 'o',
            chr(225).chr(187).chr(148) => 'O', chr(225).chr(187).chr(149) => 'o',
            chr(225).chr(187).chr(158) => 'O', chr(225).chr(187).chr(159) => 'o',
            chr(225).chr(187).chr(166) => 'U', chr(225).chr(187).chr(167) => 'u',
            chr(225).chr(187).chr(172) => 'U', chr(225).chr(187).chr(173) => 'u',
            chr(225).chr(187).chr(182) => 'Y', chr(225).chr(187).chr(183) => 'y',
            // tilde
            chr(225).chr(186).chr(170) => 'A', chr(225).chr(186).chr(171) => 'a',
            chr(225).chr(186).chr(180) => 'A', chr(225).chr(186).chr(181) => 'a',
            chr(225).chr(186).chr(188) => 'E', chr(225).chr(186).chr(189) => 'e',
            chr(225).chr(187).chr(132) => 'E', chr(225).chr(187).chr(133) => 'e',
            chr(225).chr(187).chr(150) => 'O', chr(225).chr(187).chr(151) => 'o',
            chr(225).chr(187).chr(160) => 'O', chr(225).chr(187).chr(161) => 'o',
            chr(225).chr(187).chr(174) => 'U', chr(225).chr(187).chr(175) => 'u',
            chr(225).chr(187).chr(184) => 'Y', chr(225).chr(187).chr(185) => 'y',
            // acute accent
            chr(225).chr(186).chr(164) => 'A', chr(225).chr(186).chr(165) => 'a',
            chr(225).chr(186).chr(174) => 'A', chr(225).chr(186).chr(175) => 'a',
            chr(225).chr(186).chr(190) => 'E', chr(225).chr(186).chr(191) => 'e',
            chr(225).chr(187).chr(144) => 'O', chr(225).chr(187).chr(145) => 'o',
            chr(225).chr(187).chr(154) => 'O', chr(225).chr(187).chr(155) => 'o',
            chr(225).chr(187).chr(168) => 'U', chr(225).chr(187).chr(169) => 'u',
            // dot below
            chr(225).chr(186).chr(160) => 'A', chr(225).chr(186).chr(161) => 'a',
            chr(225).chr(186).chr(172) => 'A', chr(225).chr(186).chr(173) => 'a',
            chr(225).chr(186).chr(182) => 'A', chr(225).chr(186).chr(183) => 'a',
            chr(225).chr(186).chr(184) => 'E', chr(225).chr(186).chr(185) => 'e',
            chr(225).chr(187).chr(134) => 'E', chr(225).chr(187).chr(135) => 'e',
            chr(225).chr(187).chr(138) => 'I', chr(225).chr(187).chr(139) => 'i',
            chr(225).chr(187).chr(140) => 'O', chr(225).chr(187).chr(141) => 'o',
            chr(225).chr(187).chr(152) => 'O', chr(225).chr(187).chr(153) => 'o',
            chr(225).chr(187).chr(162) => 'O', chr(225).chr(187).chr(163) => 'o',
            chr(225).chr(187).chr(164) => 'U', chr(225).chr(187).chr(165) => 'u',
            chr(225).chr(187).chr(176) => 'U', chr(225).chr(187).chr(177) => 'u',
            chr(225).chr(187).chr(180) => 'Y', chr(225).chr(187).chr(181) => 'y',
            // Vowels with diacritic (Chinese, Hanyu Pinyin)
            chr(201).chr(145) => 'a',
            // macron
            chr(199).chr(149) => 'U', chr(199).chr(150) => 'u',
            // acute accent
            chr(199).chr(151) => 'U', chr(199).chr(152) => 'u',
            // caron
            chr(199).chr(141) => 'A', chr(199).chr(142) => 'a',
            chr(199).chr(143) => 'I', chr(199).chr(144) => 'i',
            chr(199).chr(145) => 'O', chr(199).chr(146) => 'o',
            chr(199).chr(147) => 'U', chr(199).chr(148) => 'u',
            chr(199).chr(153) => 'U', chr(199).chr(154) => 'u',
            // grave accent
            chr(199).chr(155) => 'U', chr(199).chr(156) => 'u',
        );

        $string = strtr($string, $chars);
    } else {
        $chars = array();
        // Assume ISO-8859-1 if not UTF-8
        $chars['in'] = chr(128).chr(131).chr(138).chr(142).chr(154).chr(158)
            .chr(159).chr(162).chr(165).chr(181).chr(192).chr(193).chr(194)
            .chr(195).chr(196).chr(197).chr(199).chr(200).chr(201).chr(202)
            .chr(203).chr(204).chr(205).chr(206).chr(207).chr(209).chr(210)
            .chr(211).chr(212).chr(213).chr(214).chr(216).chr(217).chr(218)
            .chr(219).chr(220).chr(221).chr(224).chr(225).chr(226).chr(227)
            .chr(228).chr(229).chr(231).chr(232).chr(233).chr(234).chr(235)
            .chr(236).chr(237).chr(238).chr(239).chr(241).chr(242).chr(243)
            .chr(244).chr(245).chr(246).chr(248).chr(249).chr(250).chr(251)
            .chr(252).chr(253).chr(255);

        $chars['out'] = "EfSZszYcYuAAAAAACEEEEIIIINOOOOOOUUUUYaaaaaaceeeeiiiinoooooouuuuyy";

        $string = strtr($string, $chars['in'], $chars['out']);
        $double_chars = array();
        $double_chars['in'] = array(chr(140), chr(156), chr(198), chr(208), chr(222), chr(223), chr(230), chr(240), chr(254));
        $double_chars['out'] = array('OE', 'oe', 'AE', 'DH', 'TH', 'ss', 'ae', 'dh', 'th');
        $string = str_replace($double_chars['in'], $double_chars['out'], $string);
    }

    return $string;
}

My answer is an update of @dynamic solution since Romanian or perhaps other language diacritics weren't converted. I wrote the minimum functions and works like a charm.

print_r(remove_accents('Iași, Iași County, Romania'));
Community
  • 1
  • 1
0
<?php
/* 
 * Thanks:
 *   - The idea of extracting accents equiv chars with the help of the HTMLSpecialChars convertion was taking from ICanBoogie Package of 'Olivier Laviale' {@link http://www.weirdog.com/blog/php/supprimer-les-accents-des-caracteres-accentues.html}
*/
function accentCharsModifier($str){
    if(($length=mb_strlen($str,"UTF-8"))<strlen($str)){
        $i=$count=0;
        while($i<$length){
            if(strlen($c=mb_substr($str,$i,1,"UTF-8"))>1){
                $he=htmlentities($c); 
                if(($nC=preg_replace("#&([A-Za-z])(?:acute|cedil|caron|circ|grave|orn|ring|slash|th|tilde|uml);#", "\\1", $he))!=$he ||
                    ($nC=preg_replace("#&([A-Za-z]{2})(?:lig);#", "\\1", $he))!=$he ||
                    ($nC=preg_replace("#&[^;]+;#", "", $he))!=$he){
                    $str=str_replace($c,$nC,$str,$count);if($nC==""){$length=$length-$count;$i--;}
                }
            }
            $i++;
        }
    }
    return $str;
}
echo accentCharsModifier("&éôpkAÈû");
?>
S.Younes
  • 21
  • 3
0

Based on @Mimouni answer I made this function to transliterate Accented strings to Non Accented strings.

/**
 * @param $str Convert string to lowercase and replace special chars to equivalents ou remove its
 * @return string
 */
function _slugify(string $string): string
{
    $str = $string; // for comparisons
    $str = _toUtf8($str); // Force to work with string in UTF-8
    $str = iconv('UTF-8', 'ASCII//TRANSLIT', $str);

    if ($str != htmlentities($string, ENT_QUOTES, 'UTF-8')) { // iconv fails
        $str = _toUtf8($string);
        $str = htmlentities($str, ENT_QUOTES, 'UTF-8');
        $str = preg_replace('#&([a-z]{1,2})(acute|cedil|circ|grave|lig|orn|ring|slash|th|tilde|uml);#i', '$1', $str);
        // Need to strip non ASCII chars or any other than a-z, A-Z, 0-9...
        $str = html_entity_decode($str, ENT_QUOTES, 'UTF-8');
        $str = preg_replace(array('#[^0-9a-z]#i', '#[ -]+#'), ' ', $str);
        $str = trim($str, ' -');
    }

    // lowercase
    $string = strtolower($str);

    return $string;
}

To convert strings to UTF-8, here I use the Multi Byte String extension. Note that I break string in pieces to avoid trouble with mixed content (I have such situation) and convert word by word.

/**
 * @param $str string String in any encoding
 * @return string
 */
function _toUtf8(string $str_in): ?string
{
    if (!function_exists('mb_detect_encoding')) {
        throw new \Exception('The Multi Byte String extension is absent!');
    }
    $str_out = [];
    $words = explode(" ", $str_in);
    foreach ($words as $word) {
        $current_encoding = mb_detect_encoding($word, 'UTF-8, ASCII, ISO-8859-1');
        $str_out[] = mb_convert_encoding($word, 'UTF-8', $current_encoding);
    }
    return implode(" ", $str_out);
}

Footer Notes: Was the only solution that pass in PHPUnit UnitTests in Windows command Line (locale issues) The @gabo solution should work but unfortunately not for me

Marcos Regis
  • 814
  • 9
  • 13
0

Combining some of the answers:

/**
 * Given an utf8 string, returns an ascii string.
 * Only supports ISO-8859-1 chars, not chars like č, œ, Œ, ř, Š, š, ů, Ÿ or ž
 * @param string $utf8
 * @return string ascii
 */
function stripAccents(string $utf8): string
{
    $src = 'àáâãäåçèéêëìíîïñòóôõöùúûüýÿÀÁÂÃÄÇÈÉÊËÌÍÎÏÑÒÓÔÕÖÙÚÛÜÝ';
    $dst = 'aaaaaaceeeeiiiinooooouuuuyyAAAAACEEEEIIIINOOOOOUUUUY';
    $iso8859_1 = strtr(
        utf8_decode($utf8),
        utf8_decode($src),
        $dst
    );
    // iconv below not used for UTF-8 directly because it adds chars like ^'"`
    // which are nice to not lose info, but not very readable and not compatible with filenames
    // iconv() additionally converts chars like
    // ¡ ¢ £  ¥   ¦ §  ¨  ©  ª «  ¬   SHY ®    °  ±   ²  ³  ´ µ ¶ · ¸ ¹  º »  ¼   ½   ¾   ¿ Å Æ  Ð Ø Þ  ß  æ  ÷   ø þ
    // ! c lb yen | SS " (c) a << not SHY (R)  ^0 +/- ^2 ^3 ' u P . , ^1 o >> 1/4 1/2 3/4 ? A AE D O Th ss ae d : o th
    return iconv('ISO-8859-1', 'ASCII//TRANSLIT//IGNORE', $iso8859_1);
}
PaulH
  • 2,918
  • 2
  • 15
  • 31
0

Since Symfony 5+

composer require symfony/string 
use Symfony\Component\String\Slugger\AsciiSlugger;

$slugger = new AsciiSlugger();
$slug = $slugger->slug('Wôrķšƥáçè ~~sèťtïñğš~~');
// $slug = 'Workspace-settings'

More: https://symfony.com/doc/current/components/string.html#slugger

DavidG
  • 301
  • 2
  • 8
-1

One of the tricks I stumbled upon on the web was using htmlentities then stripping the encoded character :

$stripped = preg_replace('`&[^;]+;`','',htmlentities($string));

Not perfect but it does work well in some case.

But, you're writing about creating an URL string, so urlencode and its counterpart urldecode may be better. Or, if you are creating a query string, use this last function : http_build_query.

-1

WordPress' implementation is definitly the safest for UTF8 strings. For Latin1 strings, a simple strtr does the job, but ensure you're saving your script in LATIN1 format, not UTF-8.

NicolasBernier
  • 1,586
  • 14
  • 15
-1
$unwanted_array = array(    '&amp;' => 'and', '&' => 'and', '@' => 'at', '©' => 'c', '®' => 'r', 
'̊'=>'','̧'=>'','̨'=>'','̄'=>'','̱'=>'',
'Á'=>'a','á'=>'a','À'=>'a','à'=>'a','Ă'=>'a','ă'=>'a','ắ'=>'a','Ắ'=>'A','Ằ'=>'A',
'ằ'=>'a','ẵ'=>'a','Ẵ'=>'A','ẳ'=>'a','Ẳ'=>'A','Â'=>'a','â'=>'a','ấ'=>'a','Ấ'=>'A',
'ầ'=>'a','Ầ'=>'a','ẩ'=>'a','Ẩ'=>'A','Ǎ'=>'a','ǎ'=>'a','Å'=>'a','å'=>'a','Ǻ'=>'a',
'ǻ'=>'a','Ä'=>'a','ä'=>'a','ã'=>'a','Ã'=>'A','Ą'=>'a','ą'=>'a','Ā'=>'a','ā'=>'a',
'ả'=>'a','Ả'=>'a','Ạ'=>'A','ạ'=>'a','ặ'=>'a','Ặ'=>'A','ậ'=>'a','Ậ'=>'A','Æ'=>'ae',
'æ'=>'ae','Ǽ'=>'ae','ǽ'=>'ae','ẫ'=>'a','Ẫ'=>'A',
'Ć'=>'c','ć'=>'c','Ĉ'=>'c','ĉ'=>'c','Č'=>'c','č'=>'c','Ċ'=>'c','ċ'=>'c','Ç'=>'c','ç'=>'c',
'Ď'=>'d','ď'=>'d','Ḑ'=>'D','ḑ'=>'d','Đ'=>'d','đ'=>'d','Ḍ'=>'D','ḍ'=>'d','Ḏ'=>'D','ḏ'=>'d','ð'=>'d','Ð'=>'D',
'É'=>'e','é'=>'e','È'=>'e','è'=>'e','Ĕ'=>'e','ĕ'=>'e','ê'=>'e','ế'=>'e','Ế'=>'E','ề'=>'e',
'Ề'=>'E','Ě'=>'e','ě'=>'e','Ë'=>'e','ë'=>'e','Ė'=>'e','ė'=>'e','Ę'=>'e','ę'=>'e','Ē'=>'e',
'ē'=>'e','ệ'=>'e','Ệ'=>'E','Ə'=>'e','ə'=>'e','ẽ'=>'e','Ẽ'=>'E','ễ'=>'e',
'Ễ'=>'E','ể'=>'e','Ể'=>'E','ẻ'=>'e','Ẻ'=>'E','ẹ'=>'e','Ẹ'=>'E',
'ƒ'=>'f',
'Ğ'=>'g','ğ'=>'g','Ĝ'=>'g','ĝ'=>'g','Ǧ'=>'G','ǧ'=>'g','Ġ'=>'g','ġ'=>'g','Ģ'=>'g','ģ'=>'g',
'H̲'=>'H','h̲'=>'h','Ĥ'=>'h','ĥ'=>'h','Ȟ'=>'H','ȟ'=>'h','Ḩ'=>'H','ḩ'=>'h','Ħ'=>'h','ħ'=>'h','Ḥ'=>'H','ḥ'=>'h',
'Ỉ'=>'I','Í'=>'i','í'=>'i','Ì'=>'i','ì'=>'i','Ĭ'=>'i','ĭ'=>'i','Î'=>'i','î'=>'i','Ǐ'=>'i','ǐ'=>'i',
'Ï'=>'i','ï'=>'i','Ḯ'=>'I','ḯ'=>'i','Ĩ'=>'i','ĩ'=>'i','İ'=>'i','Į'=>'i','į'=>'i','Ī'=>'i','ī'=>'i',
'ỉ'=>'I','Ị'=>'I','ị'=>'i','IJ'=>'ij','ij'=>'ij','ı'=>'i',
'Ĵ'=>'j','ĵ'=>'j',
'Ķ'=>'k','ķ'=>'k','Ḵ'=>'K','ḵ'=>'k',
'Ĺ'=>'l','ĺ'=>'l','Ľ'=>'l','ľ'=>'l','Ļ'=>'l','ļ'=>'l','Ł'=>'l','ł'=>'l','Ŀ'=>'l','ŀ'=>'l',
'Ń'=>'n','ń'=>'n','Ň'=>'n','ň'=>'n','Ñ'=>'N','ñ'=>'n','Ņ'=>'n','ņ'=>'n','Ṇ'=>'N','ṇ'=>'n','Ŋ'=>'n','ŋ'=>'n',
'Ó'=>'o','ó'=>'o','Ò'=>'o','ò'=>'o','Ŏ'=>'o','ŏ'=>'o','Ô'=>'o','ô'=>'o','ố'=>'o','Ố'=>'O','ồ'=>'o',
'Ồ'=>'O','ổ'=>'o','Ổ'=>'O','Ǒ'=>'o','ǒ'=>'o','Ö'=>'o','ö'=>'o','Ő'=>'o','ő'=>'o','Õ'=>'o','õ'=>'o',
'Ø'=>'o','ø'=>'o','Ǿ'=>'o','ǿ'=>'o','Ǫ'=>'O','ǫ'=>'o','Ǭ'=>'O','ǭ'=>'o','Ō'=>'o','ō'=>'o','ỏ'=>'o',
'Ỏ'=>'O','Ơ'=>'o','ơ'=>'o','ớ'=>'o','Ớ'=>'O','ờ'=>'o','Ờ'=>'O','ở'=>'o','Ở'=>'O','ợ'=>'o','Ợ'=>'O',
'ọ'=>'o','Ọ'=>'O','ọ'=>'o','Ọ'=>'O','ộ'=>'o','Ộ'=>'O','ỗ'=>'o','Ỗ'=>'O','ỡ'=>'o','Ỡ'=>'O',
'Œ'=>'oe','œ'=>'oe',
'ĸ'=>'k',
'Ŕ'=>'r','ŕ'=>'r','Ř'=>'r','ř'=>'r','ṙ'=>'r','Ŗ'=>'r','ŗ'=>'r','Ṛ'=>'R','ṛ'=>'r','Ṟ'=>'R','ṟ'=>'r',
'S̲'=>'S','s̲'=>'s','Ś'=>'s','ś'=>'s','Ŝ'=>'s','ŝ'=>'s','Š'=>'s','š'=>'s','Ş'=>'s','ş'=>'s',
'Ṣ'=>'S','ṣ'=>'s','Ș'=>'S','ș'=>'s',
'ſ'=>'z','ß'=>'ss','Ť'=>'t','ť'=>'t','Ţ'=>'t','ţ'=>'t','Ṭ'=>'T','ṭ'=>'t','Ț'=>'T',
'ț'=>'t','Ṯ'=>'T','ṯ'=>'t','™'=>'tm','Ŧ'=>'t','ŧ'=>'t',
'Ú'=>'u','ú'=>'u','Ù'=>'u','ù'=>'u','Ŭ'=>'u','ŭ'=>'u','Û'=>'u','û'=>'u','Ǔ'=>'u','ǔ'=>'u','Ů'=>'u','ů'=>'u',
'Ü'=>'u','ü'=>'u','Ǘ'=>'u','ǘ'=>'u','Ǜ'=>'u','ǜ'=>'u','Ǚ'=>'u','ǚ'=>'u','Ǖ'=>'u','ǖ'=>'u','Ű'=>'u','ű'=>'u',
'Ũ'=>'u','ũ'=>'u','Ų'=>'u','ų'=>'u','Ū'=>'u','ū'=>'u','Ư'=>'u','ư'=>'u','ứ'=>'u','Ứ'=>'U','ừ'=>'u','Ừ'=>'U',
'ử'=>'u','Ử'=>'U','ự'=>'u','Ự'=>'U','ụ'=>'u','Ụ'=>'U','ủ'=>'u','Ủ'=>'U','ữ'=>'u','Ữ'=>'U',
'Ŵ'=>'w','ŵ'=>'w',
'Ý'=>'y','ý'=>'y','ỳ'=>'y','Ỳ'=>'Y','Ŷ'=>'y','ŷ'=>'y','ÿ'=>'y','Ÿ'=>'y','ỹ'=>'y','Ỹ'=>'Y','ỷ'=>'y','Ỷ'=>'Y',
'Z̲'=>'Z','z̲'=>'z','Ź'=>'z','ź'=>'z','Ž'=>'z','ž'=>'z','Ż'=>'z','ż'=>'z','Ẕ'=>'Z','ẕ'=>'z',
'þ'=>'p','ʼn'=>'n','А'=>'a','а'=>'a','Б'=>'b','б'=>'b','В'=>'v','в'=>'v','Г'=>'g','г'=>'g','Ґ'=>'g','ґ'=>'g',
'Д'=>'d','д'=>'d','Е'=>'e','е'=>'e','Ё'=>'jo','ё'=>'jo','Є'=>'e','є'=>'e','Ж'=>'zh','ж'=>'zh','З'=>'z','з'=>'z',
'И'=>'i','и'=>'i','І'=>'i','і'=>'i','Ї'=>'i','ї'=>'i','Й'=>'j','й'=>'j','К'=>'k','к'=>'k','Л'=>'l','л'=>'l',
'М'=>'m','м'=>'m','Н'=>'n','н'=>'n','О'=>'o','о'=>'o','П'=>'p','п'=>'p','Р'=>'r','р'=>'r','С'=>'s','с'=>'s',
'Т'=>'t','т'=>'t','У'=>'u','у'=>'u','Ф'=>'f','ф'=>'f','Х'=>'h','х'=>'h','Ц'=>'c','ц'=>'c','Ч'=>'ch','ч'=>'ch',
'Ш'=>'sh','ш'=>'sh','Щ'=>'sch','щ'=>'sch','Ъ'=>'-',
'ъ'=>'-','Ы'=>'y','ы'=>'y','Ь'=>'-','ь'=>'-',
'Э'=>'je','э'=>'je','Ю'=>'ju','ю'=>'ju','Я'=>'ja','я'=>'ja','א'=>'a','ב'=>'b','ג'=>'g','ד'=>'d','ה'=>'h','ו'=>'v',
'ז'=>'z','ח'=>'h','ט'=>'t','י'=>'i','ך'=>'k','כ'=>'k','ל'=>'l','ם'=>'m','מ'=>'m','ן'=>'n','נ'=>'n','ס'=>'s','ע'=>'e',
'ף'=>'p','פ'=>'p','ץ'=>'C','צ'=>'c','ק'=>'q','ר'=>'r','ש'=>'w','ת'=>'t'
);

$accentsRemoved = strtr( $stringToRemoveAccents , $unwanted_array );
  • Why this will improve or its a better solution than another accepted answer in this old question? – bcesars Mar 26 '15 at 14:13
  • Diego Castillo answered this : http://cubiq.org/the-perfect-php-clean-url-generator and it's more better –  Mar 27 '15 at 13:45
  • this is the only answer that has all the accents. majority of other answers don't even contain č which is very common in european languages – user151496 May 24 '16 at 09:13