36

How to transliterate cyrillic characters into latin letters?

E.g. Главная страница -> Glavnaja stranica

This Transliteration PHP Extension would do this very well, but I can't install it on my server.

It would be best to have the same implementation but in PHP.

casperOne
  • 73,706
  • 19
  • 184
  • 253
Sfisioza
  • 3,830
  • 6
  • 42
  • 57

14 Answers14

74

Try following code

$textcyr="Тествам с кирилица";
        $textlat="I pone dotuk raboti!";
        $cyr = ['Љ', 'Њ', 'Џ', 'џ', 'ш', 'ђ', 'ч', 'ћ', 'ж', 'љ', 'њ', 'Ш', 'Ђ', 'Ч', 'Ћ', 'Ж','Ц','ц', 'а','б','в','г','д','е','ё','ж','з','и','й','к','л','м','н','о','п', 'р','с','т','у','ф','х','ц','ч','ш','щ','ъ','ы','ь','э','ю','я', 'А','Б','В','Г','Д','Е','Ё','Ж','З','И','Й','К','Л','М','Н','О','П', 'Р','С','Т','У','Ф','Х','Ц','Ч','Ш','Щ','Ъ','Ы','Ь','Э','Ю','Я'
        ];
        $lat = ['Lj', 'Nj', 'Dž', 'dž', 'š', 'đ', 'č', 'ć', 'ž', 'lj', 'nj', 'Š', 'Đ', 'Č', 'Ć', 'Ž','C','c', 'a','b','v','g','d','e','io','zh','z','i','y','k','l','m','n','o','p', 'r','s','t','u','f','h','ts','ch','sh','sht','a','i','y','e','yu','ya', 'A','B','V','G','D','E','Io','Zh','Z','I','Y','K','L','M','N','O','P', 'R','S','T','U','F','H','Ts','Ch','Sh','Sht','A','I','Y','e','Yu','Ya'
        ];
        $textcyr = str_replace($cyr, $lat, $textcyr);
        $textlat = str_replace($lat, $cyr, $textlat);
        echo("$textcyr $textlat");
Community
  • 1
  • 1
Tural Ali
  • 22,202
  • 18
  • 80
  • 129
46

@Tural Teyyuboglu

Your code has a problem: if you try to transliterate for example "щеки" to latin and then back to cyrillic it will produce something like "схтеки". The multi-byte characters must appear first in the array like this:

function transliterate($textcyr = null, $textlat = null) {
    $cyr = array(
    'ж',  'ч',  'щ',   'ш',  'ю',  'а', 'б', 'в', 'г', 'д', 'е', 'з', 'и', 'й', 'к', 'л', 'м', 'н', 'о', 'п', 'р', 'с', 'т', 'у', 'ф', 'х', 'ц', 'ъ', 'ь', 'я',
    'Ж',  'Ч',  'Щ',   'Ш',  'Ю',  'А', 'Б', 'В', 'Г', 'Д', 'Е', 'З', 'И', 'Й', 'К', 'Л', 'М', 'Н', 'О', 'П', 'Р', 'С', 'Т', 'У', 'Ф', 'Х', 'Ц', 'Ъ', 'Ь', 'Я');
    $lat = array(
    'zh', 'ch', 'sht', 'sh', 'yu', 'a', 'b', 'v', 'g', 'd', 'e', 'z', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'r', 's', 't', 'u', 'f', 'h', 'c', 'y', 'x', 'q',
    'Zh', 'Ch', 'Sht', 'Sh', 'Yu', 'A', 'B', 'V', 'G', 'D', 'E', 'Z', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'R', 'S', 'T', 'U', 'F', 'H', 'c', 'Y', 'X', 'Q');
    if($textcyr) return str_replace($cyr, $lat, $textcyr);
    else if($textlat) return str_replace($lat, $cyr, $textlat);
    else return null;
}

echo transliterate(null, transliterate("щеки")) == "щеки";

bobef
  • 990
  • 1
  • 9
  • 14
  • I tested my code with 40k words from the dictionary and it has flaws in cases like "безизходен" -> "bezizhoden" -> "безижоден". If I manage to find a solition that both keeps human readability and is loseless I will post the solution (on my site). – bobef Dec 01 '11 at 10:28
  • Yes. I found that there is no loseless solution. Simply one alphabet has ~30chars and the other has ~26chars. – bobef Aug 08 '14 at 13:28
  • 2
    there is a typo :'e' in $cyr array is actually latin 'e'. They look the same but this will make problem in further transformations. – d.raev Jan 05 '15 at 08:44
  • 1
    You have Ы and Ё missing. – Yaroslav Mar 14 '16 at 14:34
  • Quite useful, thanks! However, this conversion table uses the so called "chat spelling" for the letters "ц", "ъ", "ь", and "я". If you want to make it compliant with transliteration rules, you need to change "ц" => "ts", "ъ" => "a", "ь" => "y", and "я" => "ya". – cheeseus Jul 09 '16 at 16:52
  • "The multi-byte characters must appear first ", but you f-ed it up... Your big multibyte characters should be after small multibyte characters, so something like ['ж', 'ч', 'щ', 'ш', 'ю','Ж', 'Ч', 'Щ', 'Ш', 'Ю',, ] – Enis P. Aginić Dec 25 '19 at 08:13
22

The best option is using PHP Intl Extension. You might want install it first.

This will do the trick:

$transliteratedString = transliterator_transliterate('Russian-Latin/BGN', $cyrillicString);

I applied 'Russian-Latin/BGN' because the asker used Russian language in his question. However, there are options for other languages written in the Cyrillic script. To view all of them do this:

print_r(transliterator_list_ids());
Ilyich
  • 4,966
  • 3
  • 39
  • 27
6
$textcyr="Тест на кирилице";
$textlat="Test na kirilitse!";
$cyr  = array('а','б','в','г','д','е','ё','ж','з','и','й','к','л','м','н','о','п','р','с','т','у', 
            'ф','х','ц','ч','ш','щ','ъ', 'ы','ь', 'э', 'ю','я','А','Б','В','Г','Д','Е','Ж','З','И','Й','К','Л','М','Н','О','П','Р','С','Т','У',
            'Ф','Х','Ц','Ч','Ш','Щ','Ъ', 'Ы','Ь', 'Э', 'Ю','Я' );
$lat = array( 'a','b','v','g','d','e','io','zh','z','i','y','k','l','m','n','o','p','r','s','t','u',
            'f' ,'h' ,'ts' ,'ch','sh' ,'sht' ,'a', 'i', 'y', 'e' ,'yu' ,'ya','A','B','V','G','D','E','Zh',
            'Z','I','Y','K','L','M','N','O','P','R','S','T','U',
            'F' ,'H' ,'Ts' ,'Ch','Sh' ,'Sht' ,'A' ,'Y' ,'Yu' ,'Ya' );

$textcyr = str_replace($cyr, $lat, $textcyr);
$textlat = str_replace($lat, $cyr, $textlat);
echo("$textcyr $textlat");

missing ё, э, ы (Э, Ы, Ё) letters

urmaul
  • 7,180
  • 1
  • 18
  • 13
Av007
  • 106
  • 1
  • 3
6

Here is a function that I use for cleaning characters on Bosnian,Croatian,Serbian latin

 function cleanUTF($name){
        $name = str_replace(array('š','č','đ','č','ć','ž','ñ'),array('s','c','d','c','c','z','n'), $name);
        $name = str_replace(array('Š','Č','Đ','Č','Ć', 'Ž','Ñ'),array('S','C','D','C','C','Z','N'), $name);
        $name = str_replace(array('а','б','в','г','д','е','ё','ж','з','и','й','к','л','љ','м','н','њ','о','п','р','с','т','у','ф','х','ц','ч','џ','ш','щ','ъ','ы','ь','э','ю','я','А','Б','В','Г','Д','Е','Ё','Ж','З','И','Й','К','Л','Љ','М','Н','Њ','О','П','Р','С','Т','У','Ф','Х','Ц','Ч','Џ','Ш','Щ','Ъ','Ы','Ь','Э','Ю','Я'),
                            array('a','b','v','g','d','e','e','z','z','i','j','k','l','lj','m','n','nj','o','p','r','s','t','u','f','h','c','c','dz','s','s','i','j','j','e','ju','ja','A','B','V','G','D','E','E','Z','Z','I','J','K','L','Lj','M','N','Nj','O','P','R','S','T','U','F','H','C','C','Dz','S','S','I','J','J','E','Ju','Ja'), $name);
        return $name;
    }
6

You should try iconv() with the //TRANSLIT option.

$trstr = iconv(<your encoding here>, "ISO-8859-1//TRANSLIT", $src_str)
Kerrek SB
  • 464,522
  • 92
  • 875
  • 1,084
  • 3
    I would had bet that this was the correct answer but iconv() does not seem to support transliteration on Cyrillic characters. – Álvaro González Nov 27 '11 at 13:51
  • 5
    Ah, I see, `iconv()` doesn't have that transliteration scheme. ICU does, though, so if your PHP is compiled with ICU, you can use [`transliterate`](http://www.php.net/manual/en/transliterator.transliterate.php). (You might need to `aptitude install php5-intl` that on your Debian-based machine.) – Kerrek SB Nov 27 '11 at 14:09
4

This is my version of transliteration table for russian alphabet. It's unofficial but based on technical standards GOST 7.79-2000 and GOST 16876-71. Multi-characters go first.

public static function transliterate($textcyr = null, $textlat = null) {
    $cyr = array(
        'ё',  'ж',  'х',  'ц',  'ч',  'щ',   'ш',  'ъ',  'э',  'ю',  'я',  'а', 'б', 'в', 'г', 'д', 'е', 'з', 'и', 'й', 'к', 'л', 'м', 'н', 'о', 'п', 'р', 'с', 'т', 'у', 'ф', 'ь',
        'Ё',  'Ж',  'Х',  'Ц',  'Ч',  'Щ',   'Ш',  'Ъ',  'Э',  'Ю',  'Я',  'А', 'Б', 'В', 'Г', 'Д', 'Е', 'З', 'И', 'Й', 'К', 'Л', 'М', 'Н', 'О', 'П', 'Р', 'С', 'Т', 'У', 'Ф', 'Ь');
    $lat = array(
        'yo', 'zh', 'kh', 'ts', 'ch', 'shh', 'sh', '``', 'eh', 'yu', 'ya', 'a', 'b', 'v', 'g', 'd', 'e', 'z', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'r', 's', 't', 'u', 'f', '`',
        'Yo', 'Zh', 'Kh', 'Ts', 'Ch', 'Shh', 'Sh', '``', 'Eh', 'Yu', 'Ya', 'A', 'B', 'V', 'G', 'D', 'E', 'Z', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'R', 'S', 'T', 'U', 'F', '`');
    if($textcyr)
        return str_replace($cyr, $lat, $textcyr);
    else if($textlat)
        return str_replace($lat, $cyr, $textlat);
    else
        return null;
}
4

This one worked best for me. Code is from this page

function ru2lat($str)
{
    $tr = array(
    "А"=>"a", "Б"=>"b", "В"=>"v", "Г"=>"g", "Д"=>"d",
    "Е"=>"e", "Ё"=>"yo", "Ж"=>"zh", "З"=>"z", "И"=>"i", 
    "Й"=>"j", "К"=>"k", "Л"=>"l", "М"=>"m", "Н"=>"n", 
    "О"=>"o", "П"=>"p", "Р"=>"r", "С"=>"s", "Т"=>"t", 
    "У"=>"u", "Ф"=>"f", "Х"=>"kh", "Ц"=>"ts", "Ч"=>"ch", 
    "Ш"=>"sh", "Щ"=>"sch", "Ъ"=>"", "Ы"=>"y", "Ь"=>"", 
    "Э"=>"e", "Ю"=>"yu", "Я"=>"ya", "а"=>"a", "б"=>"b", 
    "в"=>"v", "г"=>"g", "д"=>"d", "е"=>"e", "ё"=>"yo", 
    "ж"=>"zh", "з"=>"z", "и"=>"i", "й"=>"j", "к"=>"k", 
    "л"=>"l", "м"=>"m", "н"=>"n", "о"=>"o", "п"=>"p", 
    "р"=>"r", "с"=>"s", "т"=>"t", "у"=>"u", "ф"=>"f", 
    "х"=>"kh", "ц"=>"ts", "ч"=>"ch", "ш"=>"sh", "щ"=>"sch", 
    "ъ"=>"", "ы"=>"y", "ь"=>"", "э"=>"e", "ю"=>"yu", 
    "я"=>"ya", " "=>"-", "."=>"", ","=>"", "/"=>"-",  
    ":"=>"", ";"=>"","—"=>"", "–"=>"-"
    );
return strtr($str,$tr);
}

Hope this helps someone.

pc_
  • 578
  • 8
  • 21
  • I stumbled upon this comment and I can't wrap my head around how it's supposed to work. `strstr` shows an error when I try to execute it `needle is not a string or an integer` – Moseleyi Nov 19 '18 at 17:23
  • @Moseleyi this function just simply replaces substrings, http://php.net/manual/en/function.strtr.php – pc_ Nov 19 '18 at 19:42
4

if you want double conversion with accuracy to every letter, you need to improve the methods a little bit. I have the translation used for the url, and the url parameter is searched with the database. That's why it's very important for me to keep exact characters without replacing one with the other.

!!! Support for Ukrainian symbols.

/**
 * @param $string
 *
 * @return string only cyrillic letter
 */
function to_cyrillic($string):string
{
    $gost = [
        "a" => "а", "b" => "б", "v" => "в", "g" => "г", "d" => "д", "e" => "е", "yo" => "ё",
        "j" => "ж", "z" => "з", "ii" => "и", "ji" => "й", "k" => "к",
        "l" => "л", "m" => "м", "n" => "н", "o" => "о", "p" => "п", "r" => "р", "s" => "с", "t" => "т",
        "y" => "у", "f" => "ф", "h" => "х", "c" => "ц",
        "ch" => "ч", "sh" => "ш", "sch" => "щ", "ie" => "ы", "u" => "у", "ya" => "я", "A" => "А", "B" => "Б",
        "V" => "В", "G" => "Г", "D" => "Д", "E" => "Е", "Yo" => "Ё", "J" => "Ж", "Z" => "З", "I" => "И", "Ji" => "Й",
        "K" => "К", "L" => "Л", "M" => "М",
        "N" => "Н", "O" => "О", "P" => "П",
        "R" => "Р", "S" => "С", "T" => "Т", "Y" => "Ю", "F" => "Ф", "H" => "Х", "C" => "Ц", "Ch" => "Ч", "Sh" => "Ш",
        "Sch" => "Щ", "Ie" => "Ы", "U" => "У", "Ya" => "Я", "'" => "ь", "_'" => "Ь", "''" => "ъ", "_''" => "Ъ",
        "yi" => "ї", "ge" => "ґ",
        "ye" => "є",
        "Yi" => "Ї",
        "II" => "І",
        "Ge" => "Ґ",
        "YE" => "Є",
    ];
    return strtr($string, $gost);
}

/**
 * @param $string
 *
 * @return string only latin letter
 */
function to_latin($string):string
{
    $gost = [
        "а" => "a", "б" => "b", "в" => "v", "г" => "g", "д" => "d",
        "е" => "e", "ё" => "yo", "ж" => "j", "з" => "z", "и" => "ii",
        "й" => "ji", "к" => "k", "л" => "l", "м" => "m", "н" => "n",
        "о" => "o", "п" => "p", "р" => "r", "с" => "s", "т" => "t",
        "у" => "y", "ф" => "f", "х" => "h", "ц" => "c", "ч" => "ch",
        "ш" => "sh", "щ" => "sch", "ы" => "ie", "э" => "e", "ю" => "u",
        "я" => "ya",
        "А" => "A", "Б" => "B", "В" => "V", "Г" => "G", "Д" => "D",
        "Е" => "E", "Ё" => "Yo", "Ж" => "J", "З" => "Z", "И" => "I",
        "Й" => "Ji", "К" => "K", "Л" => "L", "М" => "M", "Н" => "N",
        "О" => "O", "П" => "P", "Р" => "R", "С" => "S", "Т" => "T",
        "У" => "Y", "Ф" => "F", "Х" => "H", "Ц" => "C", "Ч" => "Ch",
        "Ш" => "Sh", "Щ" => "Sch", "Ы" => "Ie", "Э" => "E", "Ю" => "U",
        "Я" => "Ya",
        "ь" => "'", "Ь" => "_'", "ъ" => "''", "Ъ" => "_''",
        "ї" => "yi",
        "і" => "ii",
        "ґ" => "ge",
        "є" => "ye",
        "Ї" => "Yi",
        "І" => "II",
        "Ґ" => "Ge",
        "Є" => "YE",
    ];
    return strtr($string, $gost);
}
Galaxy IT
  • 696
  • 6
  • 7
3

I wrote a full transliteration class for all European languages for utf-8. May help (comments are in polish but there isn't a lot of them so here's a few hints:

  1. numbers stored in constants are idCountry in local databse - you change them as you like.
  2. "Rób transliterację dla " means "do transliteration for " - you determine country by const name.
  3. "Słownik tłumaczący rosyjską cyrylicę wg standardu " means "dictionary with transliteration by standard "
  4. "Tablica wycinająca akcenty z różnych znaków narodowych pobrana z http://stuffofinterest.com/misc/utf8-about.html" means "Array to cut off accents from different languages" (it might help if you find some errors in iconv (or cannot use it for some reason).
  5. Methods utf2ascii and cyr2lat are pretty obvious.

Hope it will help a few people 'cause implementing it was a nightmare :)

Edit: I just noticed that part of the code is missing so I've put the full class on Pastie: class

Tomasz Kapłoński
  • 1,320
  • 4
  • 24
  • 49
3

Respecting the Yandex transliteration rules (http://www.translityandex.ru/) and converting the upper case:

function translit_russian_filenames( $filename ) {
    $info = pathinfo( $filename );
    $ext  = empty( $info['extension'] ) ? '' : '.' . $info['extension'];
    $name = basename( $filename, $ext );
     $cyr = array(
    'а', 'б', 'в', 'г', 'д', 'е', 'ё', 'ж', 'з', 'и', 'й', 'к', 'л', 'м', 'н', 'о', 'п', 'р', 'с', 'т', 'у', 'ф', 'х', 'ц', 'ч', 'ш', 'щ', 'ъ', 'ы', 'ь', 'э', 'ю', 'я',
    'А', 'Б', 'В', 'Г', 'Д', 'Е', 'Ё', 'Ж', 'З', 'И', 'Й', 'К', 'Л', 'М', 'Н', 'О', 'П', 'Р', 'С', 'Т', 'У', 'Ф', 'Х', 'Ц', 'Ч', 'Ш', 'Щ', 'Ъ', 'ы', 'Ь', 'Э', 'Ю', 'Я' );
    $lat = array(
    'a', 'b', 'v', 'g', 'd', 'e', 'yo', 'zh', 'z', 'i', 'y', 'k', 'l', 'm', 'n', 'o', 'p', 'r', 's', 't', 'u', 'f', 'h', 'c', 'ch', 'sh', 'shch', '', 'y', '', 'e', 'yu', 'ya',
    'a', 'b', 'v', 'g', 'd', 'e', 'yo', 'zh', 'z', 'i', 'y', 'k', 'l', 'm', 'n', 'o', 'p', 'r', 's', 't', 'u', 'f', 'h', 'c', 'ch', 'sh', 'shch', '', 'y', '', 'e', 'yu', 'ya');
    $name_translit = str_replace($cyr, $lat, $name);
    return $name_translit . $ext;
}
add_filter( 'sanitize_file_name', 'translit_russian_filenames', 10 );
1

Since all above are incomplete, here is my version:

    $textcyr="Тест на кирилице";
    $textlat="Test na kirilitse!";
         $cyr  = array('а','б','в','г','д','е','ё','ж','з','и','й','к','л','м','н','о','п','р','с','т','у', 
            'ф','х','ц','ч','ш','щ','ъ', 'ы','ь', 'э', 'ю','я',
            'А','Б','В','Г','Д','Е','Ё','Ж','З','И','Й','К','Л','М','Н','О','П','Р','С','Т','У',
            'Ф','Х','Ц','Ч','Ш','Щ','Ъ', 'Ы','Ь', 'Э', 'Ю','Я' );
$lat = array( 'a','b','v','g','d','e','e','zh','z','i','y','k','l','m','n','o','p','r','s','t','u',
            'f' ,'h' ,'ts' ,'ch','sh' ,'sht' ,'i', 'y', 'y', 'e' ,'yu' ,'ya','A','B','V','G','D','E','E','Zh',
            'Z','I','Y','K','L','M','N','O','P','R','S','T','U',
            'F' ,'H' ,'Ts' ,'Ch','Sh' ,'Sht' ,'I' ,'Y' ,'Y', 'E', 'Yu' ,'Ya' );

    $textcyr = str_replace($cyr, $lat, $textcyr);
    $textlat = str_replace($lat, $cyr, $textlat);
    echo("$textcyr $textlat");

I prefered ё = e, ъ = i, ы = y and э = e because I am using that way.

fnatic
  • 11
  • 1
0

for me the best solution was to use

strtr("Информация",array('И'=>'I','н'=>'n','ф'=>'f', ...and so on... ))
Denis Rudov
  • 833
  • 8
  • 16
0

$textcyr = 'Њушка Ћушка Љубав Ђато ђата части ';

$textlat = 'Ljubav njuška džoša Džoša';
$textlat = str_replace("nj","њ",$textlat);
$textlat = str_replace("Nj","Њ",$textlat);
$textlat = str_replace("lj","љ",$textlat);
$textlat = str_replace("Lj","Љ",$textlat);
$textlat = str_replace("dž","џ",$textlat);
$textlat = str_replace("Dž","Џ",$textlat);


$textcyr = str_replace($cyr, $lat, $textcyr);
$textlat = str_replace($lat, $cyr, $textlat);

echo $textcyr;
echo $textlat;