5

How do I transliterate Cyrillic symbols in string into Latin in Ruby? I can't find any docs on that. I thought there should be some standard function for that.

Patrick Oscity
  • 53,604
  • 17
  • 144
  • 168
Gherman
  • 6,768
  • 10
  • 48
  • 75

6 Answers6

11

You can use the translit gem:

require 'translit'

str = "Кириллица"
Translit.convert(str, :english)
#=> "Kirillica"
Patrick Oscity
  • 53,604
  • 17
  • 144
  • 168
  • this transliterates both directions uncontrollably but I need to only Cyrillic into Latin, not vice versa. String might contain both and latin should stay untouched. I need it to make url slugs. – Gherman May 28 '14 at 09:49
  • But how did you know it had this second parameter? Github's readme doesn't mention it! – Gherman May 28 '14 at 10:01
  • 1
    Read the code, it's 57 lines https://github.com/tjbladez/translit/blob/master/lib/translit.rb – mdesantis May 28 '14 at 10:11
  • 1
    Unit tests are always a good place to start. He uses the second argument in this [test](https://github.com/tjbladez/translit/blob/master/test/basic_test.rb#L33) – Nick Feb 25 '15 at 16:04
5

The most mature gem for working with Cyrillic/Russian is https://github.com/yaroslav/russian/

It also supports transliteration, alongside with many other services:

require 'russian'
# => true
Russian.translit('Транслит, english letters untouched')
# => "Translit, english letters untouched"

It also provides pluralisation, dates formatting, Rails i18n integration and many other goodies.

Disclaimer: I'm not in any sense affilated with the gem, just happy user.

zverok
  • 1,290
  • 1
  • 9
  • 13
1

There's a gem for that. I haven't tried it but it sounds promising...

https://github.com/dalibor/cyrillizer

SteveTurczyn
  • 36,057
  • 6
  • 41
  • 53
  • I just tried the dalibor with my name... sort of worked but not what I'm used to for cyrillic spelling of my name. :) – SteveTurczyn May 28 '14 at 09:46
  • This one fails to transliterate я, ю and some other letters for some reason. – Gherman May 28 '14 at 09:57
  • I can see the я, ю characters listed in the gem source lib/alphabets/russian.yml so I don't know why it's not working. I can also see it's missing the latin "cz" which explains why my name isn't handled correctly. It's been maintained recently, so that's positive. – SteveTurczyn May 28 '14 at 10:06
  • I don't remember "cz" ever used in Cyrillic transliteration. This is something new for me. – Gherman May 28 '14 at 17:25
  • Maybe it's a Ukrainian thing. My name is very common and pronounced "Turchin" – SteveTurczyn May 28 '14 at 19:10
  • cz is also pretty common in Polish. – toolforger Jan 11 '20 at 12:27
0
def transliterate cyrillic_string

    ru = { 'а' => 'a', 'б' => 'b', 'в' => 'v', 'г' => 'g', 'д' => 'd', \
    'е' => 'e', 'ё' => 'e', 'ж' => 'j', 'з' => 'z', 'и' => 'i', \
    'к' => 'k', 'л' => 'l', 'м' => 'm', 'н' => 'n', 'о' => 'o', \
    'п' => 'p', 'р' => 'r', 'с' => 's', 'т' => 't', 'у' => 'u', \
    'ф' => 'f', 'х' => 'h', 'ц' => 'c', 'ч' => 'ch', 'ш' => 'sh', \
    'щ' => 'shch', 'ы' => 'y', 'э' => 'e', 'ю' => 'u', 'я' => 'ya', \
    'й' => 'i', 'ъ' => '', 'ь' => ''}

    identifier = ''

    cyrillic_string.downcase.each_char do |char|
      identifier += ru[char] ? ru[char] : char
    end

    identifier.gsub!(/[^a-z0-9_]+/, '_'); # remaining non-alphanumeric => hyphen
    identifier.gsub(/^[-_]*|[-_]*$/, ''); # remove hyphens/underscores and numbers at beginning and hyphens/underscores at end
end
prograils
  • 2,248
  • 1
  • 28
  • 45
  • Does it work with capital letters too? Btw `_` is an underscore. A hyphen is a `-`. – Gherman Mar 23 '20 at 13:48
  • Also although I indeed intended to use it for Russian, I did not state this in the question. So for clarity it should be noted that this may not work for some other Slavic languages because they may have some other Cyrillic letters too. – Gherman Mar 23 '20 at 13:51
0

I didn't want to add a dependency, just wanted a simple thing in a script, so I did this:

transmap = [["кс", "x"], ["Кс", "X"], ["а", "a"], ["А", "A"], ["б", "b"], ["Б", "B"], ["в", "v"], ["В", "V"], ["г", "g"], ["Г", "G"], ["д", "d"], ["Д", "D"], ["е", "e"], ["Е", "E"], ["ё", "yo"], ["Ё", "Yo"], ["ё", "jo"], ["Ё", "Jo"], ["ё", "ö"], ["Ё", "Ö"], ["ж", "zh"], ["Ж", "Zh"], ["з", "z"], ["З", "Z"], ["и", "i"], ["И", "I"], ["й", "j"], ["Й", "J"], ["к", "k"], ["К", "K"], ["л", "l"], ["Л", "L"], ["м", "m"], ["М", "M"], ["н", "n"], ["Н", "N"], ["о", "o"], ["О", "O"], ["п", "p"], ["П", "P"], ["р", "r"], ["Р", "R"], ["с", "s"], ["С", "S"], ["т", "t"], ["Т", "T"], ["у", "u"], ["У", "U"], ["ф", "f"], ["Ф", "F"], ["х", "h"], ["Х", "H"], ["ц", "ts"], ["Ц", "Ts"], ["ч", "ch"], ["Ч", "Ch"], ["ш", "sh"], ["Ш", "Sh"], ["в", "w"], ["В", "W"], ["щ", "shch"], ["Щ", "Shch"], ["щ", "sch"], ["Щ", "Sch"], ["ъ", "#"], ["Ъ", "#"], ["ы", "y"], ["Ы", "Y"], ["ь", ""], ["Ь", ""], ["э", "je"], ["Э", "Je"], ["э", "ä"], ["Э", "Ä"], ["ю", "yu"], ["Ю", "Yu"], ["ю", "ju"], ["Ю", "Ju"], ["ю", "ü"], ["Ю", "Ü"], ["я", "ya"], ["Я", "Ya"], ["я", "ja"], ["Я", "Ja"], ["я", "q"], ["Я", "Q"]]
translit = ->(string) { transmap.inject(string) { |s, (k, v)| s.gsub(k, v) } }

translit.call("Пoo")  # "Poo"

Note that Translit maps the same Cyrillic to multiple Latin strings, e.g. "я" to "q" and "ja" and "ya" – so this code (like Translit) will just pick one of those, of course.

That's it, but details below.


I generated transmap from https://github.com/tjbladez/translit/blob/master/lib/translit.rb with this snippet:

transmap = translit_map.flat_map { |k, (up, down)| [ [ down, k ], [ up, k.capitalize ] ] }.sort_by { |k, _| -k.length }

It needs to be sorted longest-first so it does кс => x before the one-letter transliterations.

Henrik N
  • 15,786
  • 5
  • 82
  • 131
0

Works with any locale (tested with en and fr)

  def normalized_text
    I18n.transliterate(text.downcase.strip)
  end
Dorian
  • 7,749
  • 4
  • 38
  • 57