How do I transliterate Cyrillic symbols in string into Latin in Ruby? I can't find any docs on that. I thought there should be some standard function for that.
-
Certainly there's no standard function for this (in standard lib). – Sergio Tulentsev May 28 '14 at 09:36
6 Answers
You can use the translit
gem:
require 'translit'
str = "Кириллица"
Translit.convert(str, :english)
#=> "Kirillica"

- 53,604
- 17
- 144
- 168
-
this transliterates both directions uncontrollably but I need to only Cyrillic into Latin, not vice versa. String might contain both and latin should stay untouched. I need it to make url slugs. – Gherman May 28 '14 at 09:49
-
But how did you know it had this second parameter? Github's readme doesn't mention it! – Gherman May 28 '14 at 10:01
-
1Read the code, it's 57 lines https://github.com/tjbladez/translit/blob/master/lib/translit.rb – mdesantis May 28 '14 at 10:11
-
1Unit tests are always a good place to start. He uses the second argument in this [test](https://github.com/tjbladez/translit/blob/master/test/basic_test.rb#L33) – Nick Feb 25 '15 at 16:04
The most mature gem for working with Cyrillic/Russian is https://github.com/yaroslav/russian/
It also supports transliteration, alongside with many other services:
require 'russian'
# => true
Russian.translit('Транслит, english letters untouched')
# => "Translit, english letters untouched"
It also provides pluralisation, dates formatting, Rails i18n integration and many other goodies.
Disclaimer: I'm not in any sense affilated with the gem, just happy user.

- 1,290
- 1
- 9
- 13
There's a gem for that. I haven't tried it but it sounds promising...

- 36,057
- 6
- 41
- 53
-
I just tried the dalibor with my name... sort of worked but not what I'm used to for cyrillic spelling of my name. :) – SteveTurczyn May 28 '14 at 09:46
-
This one fails to transliterate я, ю and some other letters for some reason. – Gherman May 28 '14 at 09:57
-
I can see the я, ю characters listed in the gem source lib/alphabets/russian.yml so I don't know why it's not working. I can also see it's missing the latin "cz" which explains why my name isn't handled correctly. It's been maintained recently, so that's positive. – SteveTurczyn May 28 '14 at 10:06
-
I don't remember "cz" ever used in Cyrillic transliteration. This is something new for me. – Gherman May 28 '14 at 17:25
-
Maybe it's a Ukrainian thing. My name is very common and pronounced "Turchin" – SteveTurczyn May 28 '14 at 19:10
-
def transliterate cyrillic_string
ru = { 'а' => 'a', 'б' => 'b', 'в' => 'v', 'г' => 'g', 'д' => 'd', \
'е' => 'e', 'ё' => 'e', 'ж' => 'j', 'з' => 'z', 'и' => 'i', \
'к' => 'k', 'л' => 'l', 'м' => 'm', 'н' => 'n', 'о' => 'o', \
'п' => 'p', 'р' => 'r', 'с' => 's', 'т' => 't', 'у' => 'u', \
'ф' => 'f', 'х' => 'h', 'ц' => 'c', 'ч' => 'ch', 'ш' => 'sh', \
'щ' => 'shch', 'ы' => 'y', 'э' => 'e', 'ю' => 'u', 'я' => 'ya', \
'й' => 'i', 'ъ' => '', 'ь' => ''}
identifier = ''
cyrillic_string.downcase.each_char do |char|
identifier += ru[char] ? ru[char] : char
end
identifier.gsub!(/[^a-z0-9_]+/, '_'); # remaining non-alphanumeric => hyphen
identifier.gsub(/^[-_]*|[-_]*$/, ''); # remove hyphens/underscores and numbers at beginning and hyphens/underscores at end
end

- 2,248
- 1
- 28
- 45
-
Does it work with capital letters too? Btw `_` is an underscore. A hyphen is a `-`. – Gherman Mar 23 '20 at 13:48
-
Also although I indeed intended to use it for Russian, I did not state this in the question. So for clarity it should be noted that this may not work for some other Slavic languages because they may have some other Cyrillic letters too. – Gherman Mar 23 '20 at 13:51
I didn't want to add a dependency, just wanted a simple thing in a script, so I did this:
transmap = [["кс", "x"], ["Кс", "X"], ["а", "a"], ["А", "A"], ["б", "b"], ["Б", "B"], ["в", "v"], ["В", "V"], ["г", "g"], ["Г", "G"], ["д", "d"], ["Д", "D"], ["е", "e"], ["Е", "E"], ["ё", "yo"], ["Ё", "Yo"], ["ё", "jo"], ["Ё", "Jo"], ["ё", "ö"], ["Ё", "Ö"], ["ж", "zh"], ["Ж", "Zh"], ["з", "z"], ["З", "Z"], ["и", "i"], ["И", "I"], ["й", "j"], ["Й", "J"], ["к", "k"], ["К", "K"], ["л", "l"], ["Л", "L"], ["м", "m"], ["М", "M"], ["н", "n"], ["Н", "N"], ["о", "o"], ["О", "O"], ["п", "p"], ["П", "P"], ["р", "r"], ["Р", "R"], ["с", "s"], ["С", "S"], ["т", "t"], ["Т", "T"], ["у", "u"], ["У", "U"], ["ф", "f"], ["Ф", "F"], ["х", "h"], ["Х", "H"], ["ц", "ts"], ["Ц", "Ts"], ["ч", "ch"], ["Ч", "Ch"], ["ш", "sh"], ["Ш", "Sh"], ["в", "w"], ["В", "W"], ["щ", "shch"], ["Щ", "Shch"], ["щ", "sch"], ["Щ", "Sch"], ["ъ", "#"], ["Ъ", "#"], ["ы", "y"], ["Ы", "Y"], ["ь", ""], ["Ь", ""], ["э", "je"], ["Э", "Je"], ["э", "ä"], ["Э", "Ä"], ["ю", "yu"], ["Ю", "Yu"], ["ю", "ju"], ["Ю", "Ju"], ["ю", "ü"], ["Ю", "Ü"], ["я", "ya"], ["Я", "Ya"], ["я", "ja"], ["Я", "Ja"], ["я", "q"], ["Я", "Q"]]
translit = ->(string) { transmap.inject(string) { |s, (k, v)| s.gsub(k, v) } }
translit.call("Пoo") # "Poo"
Note that Translit maps the same Cyrillic to multiple Latin strings, e.g. "я" to "q" and "ja" and "ya" – so this code (like Translit) will just pick one of those, of course.
That's it, but details below.
I generated transmap
from https://github.com/tjbladez/translit/blob/master/lib/translit.rb with this snippet:
transmap = translit_map.flat_map { |k, (up, down)| [ [ down, k ], [ up, k.capitalize ] ] }.sort_by { |k, _| -k.length }
It needs to be sorted longest-first so it does кс => x before the one-letter transliterations.

- 15,786
- 5
- 82
- 131
Works with any locale (tested with en
and fr
)
def normalized_text
I18n.transliterate(text.downcase.strip)
end

- 7,749
- 4
- 38
- 57