25

I have a method which turns any Latin text (e.g. English, French, German, Polish) into its slug form,

e.g. Alpha Bravo Charlie => alpha-bravo-charlie

But it can't work for Cyrillic text (e.g. Russian), so what I'm wanting to do is transliterate the Cyrillic text to Latin characters, then slugify that.

Does anyone have a way to do such transliteration? Whether by actual source or a library.

I'm coding in C#, so a .NET library will work. Alternatively, if you have non-C# code, I'm sure I could convert it.

Rui Jarimba
  • 11,166
  • 11
  • 56
  • 86
ckknight
  • 5,953
  • 4
  • 26
  • 23

10 Answers10

22

You can use .NET open source dll library UnidecodeSharpFork to transliterate Cyrillic and many more languages to Latin.

Example usage:

Assert.AreEqual("Rabota s kirillitsey", "Работа с кириллицей".Unidecode());
Assert.AreEqual("CZSczs", "ČŽŠčžš".Unidecode());
Assert.AreEqual("Hello, World!", "Hello, World!".Unidecode());

Testing Cyrillic:

/// <summary>
/// According to http://en.wikipedia.org/wiki/Romanization_of_Russian BGN/PCGN.
/// http://en.wikipedia.org/wiki/BGN/PCGN_romanization_of_Russian
/// With converting "ё" to "yo".
/// </summary>
[TestMethod]
public void RussianAlphabetTest()
{
    string russianAlphabetLowercase = "а б в г д е ё ж з и й к л м н о п р с т у ф х ц ч ш щ ъ ы ь э ю я";
    string russianAlphabetUppercase = "А Б В Г Д Е Ё Ж З И Й К Л М Н О П Р С Т У Ф Х Ц Ч Ш Щ Ъ Ы Ь Э Ю Я";

    string expectedLowercase = "a b v g d e yo zh z i y k l m n o p r s t u f kh ts ch sh shch \" y ' e yu ya";
    string expectedUppercase = "A B V G D E Yo Zh Z I Y K L M N O P R S T U F Kh Ts Ch Sh Shch \" Y ' E Yu Ya";

    Assert.AreEqual(expectedLowercase, russianAlphabetLowercase.Unidecode());
    Assert.AreEqual(expectedUppercase, russianAlphabetUppercase.Unidecode());
}

Simple, fast and powerful. And it's easy to extend/modify transliteration table if you want to.

Dima Stefantsov
  • 943
  • 1
  • 14
  • 20
  • 6
    Wrong. This transliterates Анастасия as Anastasiya, and not Anastasia. This looks horrible. Seems like this document ( http://en.wikipedia.org/wiki/BGN/PCGN_romanization_of_Russian) is wrong in the special provisions. Furthermore, you don't take the special provisions into account, and UnidecodeSharpFork transliterated German Umlauts (äöüÄÖÜ) as aouAOU instead of ae oe ue Ae Oe Ue. This is the reason I changed from Upvote to downvote. If you do a romanization library (or algorithm), do it properly, or otherwise state that your algorithm is incomplete/buggy and not ready for production. – Stefan Steiger Dec 25 '12 at 13:12
  • I use this workaround: string str = this.Name.Replace("ь", ""); str = str.Replace("ä", "ae"); str = str.Replace("ö", "oe"); str = str.Replace("ü", "ue"); str = str.Replace("Ä", "Ae"); str = str.Replace("Ö", "Oe"); str = str.Replace("Ü", "Ue"); str = UnidecodeSharpFork.Unidecoder.Unidecode(str); //str = str.Replace("Anastasiya", "Anastasia"); str = str.Replace("iy", "i"); //return this.Name.Unidecode(); return str; – Stefan Steiger Dec 25 '12 at 13:46
  • 3
    "If you do a romanization library" I don't. It's just simple "transliterate every letter to english/latin". And it is FAR from being perfect, but it works, for many languages. For example http://dotabro.com/player/76561198060110736/madgaming-crio-j-jinmawang I'm using it for links are "better than nothing". – Dima Stefantsov Dec 25 '12 at 18:46
  • 1
    @DimaStefantsov transliteration assumes one-to-one mapping from characters of one script into another. Converting 'ä' into 'a' breaks this rule, since there is no way to uniquely transliterate the text back to the original (there is now way of knowing what character the 'a' represented in the original). There are clearly defined transliteration standards that define how a certain writing system is transliterated. Even if you have a same word in Cyrillic, it will be romanized differently in Bulgarian than in Serbian. You cannot do transliteration without knowing which language the text is in. – Igor Brejc Dec 06 '16 at 09:24
  • 1
    @DimaStefantsov (continued). My point is that you shouldn't claim your library does transliteration when it does not. But I know the source libraries you used also claim that, so I guess it's their original sin. To extend your library to cover transliteration, you would need to provide separate rulesets for different languages (Russian, Serbian, Greek etc.) - even separate rulesets for different transliteration standards (Modern Greek has at least 5 or 6 different ones). And provide a language ID (culture ID) as an input parameter for transliteration. – Igor Brejc Dec 06 '16 at 09:32
22
    public static string Translit(string str)
    {
        string[] lat_up = {"A", "B", "V", "G", "D", "E", "Yo", "Zh", "Z", "I", "Y", "K", "L", "M", "N", "O", "P", "R", "S", "T", "U", "F", "Kh", "Ts", "Ch", "Sh", "Shch", "\"", "Y", "'", "E", "Yu", "Ya"};
        string[] lat_low = {"a", "b", "v", "g", "d", "e", "yo", "zh", "z", "i", "y", "k", "l", "m", "n", "o", "p", "r", "s", "t", "u", "f", "kh", "ts", "ch", "sh", "shch", "\"", "y", "'", "e", "yu", "ya"};
        string[] rus_up = {"А", "Б", "В", "Г", "Д", "Е", "Ё", "Ж", "З", "И", "Й", "К", "Л", "М", "Н", "О", "П", "Р", "С", "Т", "У", "Ф", "Х", "Ц", "Ч", "Ш", "Щ", "Ъ", "Ы", "Ь", "Э", "Ю", "Я"};
        string[] rus_low = { "а", "б", "в", "г", "д", "е", "ё", "ж", "з", "и", "й", "к", "л", "м", "н", "о", "п", "р", "с", "т", "у", "ф", "х", "ц", "ч", "ш", "щ", "ъ", "ы", "ь", "э", "ю", "я"};
        for (int i = 0; i <= 32; i++)
        {
            str = str.Replace(rus_up[i],lat_up[i]);
            str = str.Replace(rus_low[i],lat_low[i]);              
        }
        return str;
    }
Romkar
  • 237
  • 2
  • 3
10

Optimized the answer of Sarvar Nishonboev, seems like a simplest solution without unnecessary complexity related to the re-creating of string at each iteration:

public static class Converter
{
    private static readonly Dictionary<char, string> ConvertedLetters = new Dictionary<char, string>
    {
        {'а', "a"},
        {'б', "b"},
        {'в', "v"},
        {'г', "g"},
        {'д', "d"},
        {'е', "e"},
        {'ё', "yo"},
        {'ж', "zh"},
        {'з', "z"},
        {'и', "i"},
        {'й', "j"},
        {'к', "k"},
        {'л', "l"},
        {'м', "m"},
        {'н', "n"},
        {'о', "o"},
        {'п', "p"},
        {'р', "r"},
        {'с', "s"},
        {'т', "t"},
        {'у', "u"},
        {'ф', "f"},
        {'х', "h"},
        {'ц', "c"},
        {'ч', "ch"},
        {'ш', "sh"},
        {'щ', "sch"},
        {'ъ', "j"},
        {'ы', "i"},
        {'ь', "j"},
        {'э', "e"},
        {'ю', "yu"},
        {'я', "ya"},
        {'А', "A"},
        {'Б', "B"},
        {'В', "V"},
        {'Г', "G"},
        {'Д', "D"},
        {'Е', "E"},
        {'Ё', "Yo"},
        {'Ж', "Zh"},
        {'З', "Z"},
        {'И', "I"},
        {'Й', "J"},
        {'К', "K"},
        {'Л', "L"},
        {'М', "M"},
        {'Н', "N"},
        {'О', "O"},
        {'П', "P"},
        {'Р', "R"},
        {'С', "S"},
        {'Т', "T"},
        {'У', "U"},
        {'Ф', "F"},
        {'Х', "H"},
        {'Ц', "C"},
        {'Ч', "Ch"},
        {'Ш', "Sh"},
        {'Щ', "Sch"},
        {'Ъ', "J"},
        {'Ы', "I"},
        {'Ь', "J"},
        {'Э', "E"},
        {'Ю', "Yu"},
        {'Я', "Ya"}
    };

    public static string ConvertToLatin(string source)
    {
        var result = new StringBuilder();
        foreach (var letter in source)
        {
            result.Append(ConvertedLetters[letter]);
        }
        return result.ToString();
    }
}

Use it like this:

Converter.ConvertToLatin("Проверочный текст");
Schnapz
  • 1,208
  • 13
  • 10
  • 2
    Note that this code will throw an exception if there is some latin character in the source string... – Konrad Apr 15 '21 at 13:52
8

Why can't you just take a transliteration table and make a small regex or subroutine?

Massimiliano
  • 16,770
  • 10
  • 69
  • 112
7

You can use my library for transliteration: https://github.com/nick-buhro/Translit
It is also available on NuGet.

Example:

var latin = Transliteration.CyrillicToLatin(
    "Предками данная мудрость народная!", 
    Language.Russian);

Console.WriteLine(latin);   
// Output: Predkami dannaya mudrost` narodnaya!
Nick Buhro
  • 71
  • 1
  • 4
6

Check this code:

using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Text;
using System.Windows.Forms;

namespace Transliter
{
    public partial class Form1 : Form
    {
        Dictionary<string, string> words = new Dictionary<string, string>();

        public Form1()
        {
            InitializeComponent();
            words.Add("а", "a");
            words.Add("б", "b");
            words.Add("в", "v");
            words.Add("г", "g");
            words.Add("д", "d");
            words.Add("е", "e");
            words.Add("ё", "yo");
            words.Add("ж", "zh");
            words.Add("з", "z");
            words.Add("и", "i");
            words.Add("й", "j");
            words.Add("к", "k");
            words.Add("л", "l");
            words.Add("м", "m");
            words.Add("н", "n");
            words.Add("о", "o");
            words.Add("п", "p");
            words.Add("р", "r");
            words.Add("с", "s");
            words.Add("т", "t");
            words.Add("у", "u");
            words.Add("ф", "f");
            words.Add("х", "h");
            words.Add("ц", "c");
            words.Add("ч", "ch");
            words.Add("ш", "sh");
            words.Add("щ", "sch");
            words.Add("ъ", "j");
            words.Add("ы", "i");
            words.Add("ь", "j");
            words.Add("э", "e");
            words.Add("ю", "yu");
            words.Add("я", "ya");
            words.Add("А", "A");
            words.Add("Б", "B");
            words.Add("В", "V");
            words.Add("Г", "G");
            words.Add("Д", "D");
            words.Add("Е", "E");
            words.Add("Ё", "Yo");
            words.Add("Ж", "Zh");
            words.Add("З", "Z");
            words.Add("И", "I");
            words.Add("Й", "J");
            words.Add("К", "K");
            words.Add("Л", "L");
            words.Add("М", "M");
            words.Add("Н", "N");
            words.Add("О", "O");
            words.Add("П", "P");
            words.Add("Р", "R");
            words.Add("С", "S");
            words.Add("Т", "T");
            words.Add("У", "U");
            words.Add("Ф", "F");
            words.Add("Х", "H");
            words.Add("Ц", "C");
            words.Add("Ч", "Ch");
            words.Add("Ш", "Sh");
            words.Add("Щ", "Sch");
            words.Add("Ъ", "J");
            words.Add("Ы", "I");
            words.Add("Ь", "J");
            words.Add("Э", "E");
            words.Add("Ю", "Yu");
            words.Add("Я", "Ya");
    }

        private void button1_Click(object sender, EventArgs e)
        {
            string source = textBox1.Text;
            foreach (KeyValuePair<string, string> pair in words)
            {
                source = source.Replace(pair.Key, pair.Value);
            }
            textBox2.Text = source;
        }
    }
}

cryllic to latin:

text.Replace(pair.Key, pair.Value); 

latin to cryllic

source.Replace(pair.Value,pair.Key);
Sarvar Nishonboyev
  • 12,262
  • 10
  • 69
  • 70
5

Microsoft has a transliteration tool which includes a DLL you could hook into (you would need to check licensing restrictions if you're going to use it non-personally). You can read more about it in Dejan Vesić's blog post

jball
  • 24,791
  • 9
  • 70
  • 92
4

For future readers

Windows 7+ can do this with its Extended Linguistic Services. (You'll need the Windows API Code Pack to do it from .NET)

Arithmomaniac
  • 4,604
  • 3
  • 38
  • 58
1

Here is a great article that describes how to make a C# equivalent of this JavaScript one.

string result = DisplayInEnglish("Олъга Виктровна Василенко");
-2

Use this method Just pass your Cyrillic word contain string and this method return Latin English string corresponding to Cyrillic string.

public static string GetLatinCodeFromCyrillic(string str)
    {

        str = str.Replace("б", "b");
        str = str.Replace("Б", "B");

        str = str.Replace("в", "v");
        str = str.Replace("В", "V");

        str = str.Replace("г", "h");
        str = str.Replace("Г", "H");

        str = str.Replace("ґ", "g");
        str = str.Replace("Ґ", "G");

        str = str.Replace("д", "d");
        str = str.Replace("Д", "D");

        str = str.Replace("є", "ye");
        str = str.Replace("Э", "Ye");

        str = str.Replace("ж", "zh");
        str = str.Replace("Ж", "Zh");

        str = str.Replace("з", "z");
        str = str.Replace("З", "Z");

        str = str.Replace("и", "y");
        str = str.Replace("И", "Y");

        str = str.Replace("ї", "yi");
        str = str.Replace("Ї", "YI");

        str = str.Replace("й", "j");
        str = str.Replace("Й", "J");

        str = str.Replace("к", "k");
        str = str.Replace("К", "K");

        str = str.Replace("л", "l");
        str = str.Replace("Л", "L");

        str = str.Replace("м", "m");
        str = str.Replace("М", "M");

        str = str.Replace("н", "n");
        str = str.Replace("Н", "N");

        str = str.Replace("п", "p");
        str = str.Replace("П", "P");

        str = str.Replace("р", "r");
        str = str.Replace("Р", "R");

        str = str.Replace("с", "s");
        str = str.Replace("С", "S");

        str = str.Replace("ч", "ch");
        str = str.Replace("Ч", "CH");

        str = str.Replace("ш", "sh");
        str = str.Replace("Щ", "SHH");

        str = str.Replace("ю", "yu");
        str = str.Replace("Ю", "YU");

        str = str.Replace("Я", "YA");
        str = str.Replace("я", "ya");

        str = str.Replace('ь', '"');
        str = str.Replace("Ь", "");

        str = str.Replace('т', 't');
        str = str.Replace("Т", "T");

        str = str.Replace('ц', 'c');
        str = str.Replace("Ц", "C");

        str = str.Replace('о', 'o');
        str = str.Replace("О", "O");

        str = str.Replace('е', 'e');
        str = str.Replace("Е", "E");

        str = str.Replace('а', 'a');
        str = str.Replace("А", "A");

        str = str.Replace('ф', 'f');
        str = str.Replace("Ф", "F");

        str = str.Replace('і', 'i');
        str = str.Replace("І", "I");

        str = str.Replace('У', 'U');
        str = str.Replace("у", "u");

        str = str.Replace('х', 'x');
        str = str.Replace("Х", "X");
        return str;
    }
Sergey Glotov
  • 20,200
  • 11
  • 84
  • 98
Pritesh
  • 29
  • 1