16

I need some strings that contain german chars converted to their latin equivalent. For example

'Höhle' => 'Hohle'
hakre
  • 193,403
  • 52
  • 435
  • 836
Jacek Francuz
  • 2,420
  • 8
  • 45
  • 61
  • Ok. What if I need this work for any other language that is valid in UTF-8? Is it impossible to do? You guys say that there is no any build-in function or library written in PHP that deals with it? – Jacek Francuz Jun 08 '11 at 19:40

7 Answers7

13

It obviously does not cover every single character, but should help with some of the more common ones:

<?php
/**
 * Replaces special characters in a string with their "non-special" counterpart.
 *
 * Useful for friendly URLs.
 *
 * @access public
 * @param string
 * @return string
 */
function convertAccentsAndSpecialToNormal($string) {
    $table = array(
        'À'=>'A', 'Á'=>'A', 'Â'=>'A', 'Ã'=>'A', 'Ä'=>'A', 'Å'=>'A', 'Ă'=>'A', 'Ā'=>'A', 'Ą'=>'A', 'Æ'=>'A', 'Ǽ'=>'A',
        'à'=>'a', 'á'=>'a', 'â'=>'a', 'ã'=>'a', 'ä'=>'a', 'å'=>'a', 'ă'=>'a', 'ā'=>'a', 'ą'=>'a', 'æ'=>'a', 'ǽ'=>'a',

        'Þ'=>'B', 'þ'=>'b', 'ß'=>'Ss',

        'Ç'=>'C', 'Č'=>'C', 'Ć'=>'C', 'Ĉ'=>'C', 'Ċ'=>'C',
        'ç'=>'c', 'č'=>'c', 'ć'=>'c', 'ĉ'=>'c', 'ċ'=>'c',

        'Đ'=>'Dj', 'Ď'=>'D',
        'đ'=>'dj', 'ď'=>'d',

        'È'=>'E', 'É'=>'E', 'Ê'=>'E', 'Ë'=>'E', 'Ĕ'=>'E', 'Ē'=>'E', 'Ę'=>'E', 'Ė'=>'E',
        'è'=>'e', 'é'=>'e', 'ê'=>'e', 'ë'=>'e', 'ĕ'=>'e', 'ē'=>'e', 'ę'=>'e', 'ė'=>'e',

        'Ĝ'=>'G', 'Ğ'=>'G', 'Ġ'=>'G', 'Ģ'=>'G',
        'ĝ'=>'g', 'ğ'=>'g', 'ġ'=>'g', 'ģ'=>'g',

        'Ĥ'=>'H', 'Ħ'=>'H',
        'ĥ'=>'h', 'ħ'=>'h',

        'Ì'=>'I', 'Í'=>'I', 'Î'=>'I', 'Ï'=>'I', 'İ'=>'I', 'Ĩ'=>'I', 'Ī'=>'I', 'Ĭ'=>'I', 'Į'=>'I',
        'ì'=>'i', 'í'=>'i', 'î'=>'i', 'ï'=>'i', 'į'=>'i', 'ĩ'=>'i', 'ī'=>'i', 'ĭ'=>'i', 'ı'=>'i',

        'Ĵ'=>'J',
        'ĵ'=>'j',

        'Ķ'=>'K',
        'ķ'=>'k', 'ĸ'=>'k',

        'Ĺ'=>'L', 'Ļ'=>'L', 'Ľ'=>'L', 'Ŀ'=>'L', 'Ł'=>'L',
        'ĺ'=>'l', 'ļ'=>'l', 'ľ'=>'l', 'ŀ'=>'l', 'ł'=>'l',

        'Ñ'=>'N', 'Ń'=>'N', 'Ň'=>'N', 'Ņ'=>'N', 'Ŋ'=>'N',
        'ñ'=>'n', 'ń'=>'n', 'ň'=>'n', 'ņ'=>'n', 'ŋ'=>'n', 'ʼn'=>'n',

        'Ò'=>'O', 'Ó'=>'O', 'Ô'=>'O', 'Õ'=>'O', 'Ö'=>'O', 'Ø'=>'O', 'Ō'=>'O', 'Ŏ'=>'O', 'Ő'=>'O', 'Œ'=>'O',
        'ò'=>'o', 'ó'=>'o', 'ô'=>'o', 'õ'=>'o', 'ö'=>'o', 'ø'=>'o', 'ō'=>'o', 'ŏ'=>'o', 'ő'=>'o', 'œ'=>'o', 'ð'=>'o',

        'Ŕ'=>'R', 'Ř'=>'R',
        'ŕ'=>'r', 'ř'=>'r', 'ŗ'=>'r',

        'Š'=>'S', 'Ŝ'=>'S', 'Ś'=>'S', 'Ş'=>'S',
        'š'=>'s', 'ŝ'=>'s', 'ś'=>'s', 'ş'=>'s',

        'Ŧ'=>'T', 'Ţ'=>'T', 'Ť'=>'T',
        'ŧ'=>'t', 'ţ'=>'t', 'ť'=>'t',

        'Ù'=>'U', 'Ú'=>'U', 'Û'=>'U', 'Ü'=>'U', 'Ũ'=>'U', 'Ū'=>'U', 'Ŭ'=>'U', 'Ů'=>'U', 'Ű'=>'U', 'Ų'=>'U',
        'ù'=>'u', 'ú'=>'u', 'û'=>'u', 'ü'=>'u', 'ũ'=>'u', 'ū'=>'u', 'ŭ'=>'u', 'ů'=>'u', 'ű'=>'u', 'ų'=>'u',

        'Ŵ'=>'W', 'Ẁ'=>'W', 'Ẃ'=>'W', 'Ẅ'=>'W',
        'ŵ'=>'w', 'ẁ'=>'w', 'ẃ'=>'w', 'ẅ'=>'w',

        'Ý'=>'Y', 'Ÿ'=>'Y', 'Ŷ'=>'Y',
        'ý'=>'y', 'ÿ'=>'y', 'ŷ'=>'y',

        'Ž'=>'Z', 'Ź'=>'Z', 'Ż'=>'Z',
        'ž'=>'z', 'ź'=>'z', 'ż'=>'z',

        '“'=>'"', '”'=>'"', '‘'=>"'", '’'=>"'", '•'=>'-', '…'=>'...', '—'=>'-', '–'=>'-', '¿'=>'?', '¡'=>'!', '°'=>' degrees ',
        '¼'=>' 1/4 ', '½'=>' 1/2 ', '¾'=>' 3/4 ', '⅓'=>' 1/3 ', '⅔'=>' 2/3 ', '⅛'=>' 1/8 ', '⅜'=>' 3/8 ', '⅝'=>' 5/8 ', '⅞'=>' 7/8 ',
        '÷'=>' divided by ', '×'=>' times ', '±'=>' plus-minus ', '√'=>' square root ', '∞'=>' infinity ',
        '≈'=>' almost equal to ', '≠'=>' not equal to ', '≡'=>' identical to ', '≤'=>' less than or equal to ', '≥'=>' greater than or equal to ',
        '←'=>' left ', '→'=>' right ', '↑'=>' up ', '↓'=>' down ', '↔'=>' left and right ', '↕'=>' up and down ',
        '℅'=>' care of ', '℮' => ' estimated ',
        'Ω'=>' ohm ',
        '♀'=>' female ', '♂'=>' male ',
        '©'=>' Copyright ', '®'=>' Registered ', '™' =>' Trademark ',
    );

    $string = strtr($string, $table);
    // Currency symbols: £¤¥€  - we dont bother with them for now
    $string = preg_replace("/[^\x9\xA\xD\x20-\x7F]/u", "", $string);

    return $string;
}
Anton
  • 3,998
  • 25
  • 40
simshaun
  • 21,263
  • 1
  • 57
  • 73
10

The easiest way to do that would be

echo transliterator_transliterate('Any-Latin; Latin-ASCII', "Höhle"); // returns Hohle
Alive to die - Anant
  • 70,531
  • 10
  • 51
  • 98
Matej Balantič
  • 1,627
  • 18
  • 21
  • Does not work anymore. There is problem with Latin-ASCII – instead Jun 26 '16 at 15:05
  • 1
    @instead why do you say it does not work? Work on php 7.3 with the example given in answer. Also works for Lithuanian characters. – Aurelijus Rozenas Jan 10 '20 at 09:09
  • 2
    @AurelijusRozenas Answer was posted in 2013, my comment is from 2016. PHP 7.3 didn't even exist that day. My guess is that I tested on PHP 5.6 and it didn't worked for me. Also please note that the answer doesn't provide PHP version where it works. FYI, I didn't downvoted though. Just left comment. – instead Jan 10 '20 at 21:11
  • You are Life saver! I tried so many solution to overcome mongodb errors but none worked. Thanks so much. – Maximus May 02 '20 at 04:29
4

You can also use iconv :

iconv('UTF-8', 'ASCII//TRANSLIT', 'Höhle')
pvessel
  • 41
  • 1
  • 2
    This solution outputed for me: H"ohle – Ismael May 27 '14 at 13:18
  • Sadly, it doesn't seem to support cyrillic scripts. for me. –  Oct 04 '17 at 16:51
  • `iconv` seems to produce weird outcomes for `ö` and `ü`. And it seems it works in accordance with `locale`. I don't like this. Back to manual list/replacing... :\ – akinuri Nov 24 '20 at 13:05
4

For Brazilian Portuguese I use the following

$string = 'tranformação';
$search = array('/é/', '/ç/', '/ã/', '/á/', '/ó/', '/ã/', '/ó/', '/ú/');
$replace = array('e', 'c', 'a', 'a', 'o', 'a', 'o', 'u');
$new_string = preg_replace($search, $replace, $string);
echo $new_string;

You will need to provide the characters in both upper and lower case to meet your needs.

Ryan
  • 1,878
  • 1
  • 14
  • 17
2

Use Normalizer PHP extension.

http://www.php.net/manual/en/class.normalizer.php

<?php
$string = 'Höhle';
echo Normalizer::normalize($string);
?>
Tomasz Kowalczyk
  • 10,472
  • 6
  • 52
  • 68
  • 3
    Ok. How do I use it in practice (Jak tego uzywac :P)? – Jacek Francuz Jun 08 '11 at 19:43
  • 2
    Oh, countryman, how nice. ;] I've updated my answer, ask if you have further questions. (Polak, jak miło. ;] W zaktualizowanej wersji masz odpowiedź w twoim przypadku.) – Tomasz Kowalczyk Jun 08 '11 at 19:45
  • Yep. That's actually great. Will surely use it. Thanks a lot!:) – Jacek Francuz Jun 08 '11 at 19:47
  • 2
    No problem, nice to meet you. ;] – Tomasz Kowalczyk Jun 08 '11 at 19:47
  • This combined with [php transliterator][http://www.php.net/manual/en/transliterator.transliterate.php] can also support even cyrylic and chineese, ie `自由嘅百科全書` got transformed into `zi-you-kai-bai-ke-quan-shu` :) Thanks Tomek –  Feb 13 '13 at 13:47
  • 3
    I have no idea why this answer was accepted, this is not at all what the Normalizer does. – deceze Sep 14 '13 at 13:02
  • 2
    Friendly reminder that you probably have *upvoted* my answer instead of desired downvote. I know what Normalizer does, that's just nice side-effect. If you know better *automatic* answer, please edit my answer or contact me - I'll be happy to learn new solution. If you don't know better answer (other than hand-replaces below) that would work for OP, don't criticize for the sake of doing it. – Tomasz Kowalczyk Oct 17 '13 at 14:20
  • Ha ha, that's funny, indeed I did make a mistake with upvoting. The problem is your solution does not at all solve the question. The function only normalizes different UTF8 representations of the same character to the same one. But special character still remains, it is not converted to their latin equivalent. You can read more about it here: http://www.ibm.com/developerworks/library/os-php-5.3unicode/ – Matej Balantič Oct 23 '13 at 15:18
1

You can use strtr but you should have array of equivalent chars,

function transliterate($st) {
   $st = strtr($st,
        "german",
        "english"
   );
   return $st;
}
Dmitri Gudkov
  • 2,093
  • 16
  • 15
-1

If it were me, I'd do something like this...

$map = array(   'ö' => 'o',
                // etc, etc, etc );

foreach( $map as $orig => $new )
{
   $myString = str_replace( $orig, $new, $myString );
}
KOGI
  • 3,959
  • 2
  • 24
  • 36