3

Possible Duplicate:
Transliteration in ruby

I am searching for a simple way to convert strings like these:

  • "spaß" to "spass"
  • "über" to "ueber"
  • etc.

This is needed for generating valid usernames from names of people.

Community
  • 1
  • 1
Martin Klepsch
  • 1,875
  • 3
  • 18
  • 24
  • 2
    German only? What do you want to do with something like _crêpes_ where the `ê` means that the origin of the word was _crespes_? What about _naïveté_, or _ça va_? According to another site, _"Georg Friedrich Händel is simplified into "Haendel" by the Germans and into "Handel" by the English (the latter is the spelling he used himself when he moved to London)."_ So how do you know which to pick? – Phrogz Feb 23 '12 at 15:47
  • Duplicate of [Transliteration in ruby](http://stackoverflow.com/questions/1726404/transliteration-in-ruby) and [Transliteration with Iconv in Ruby](http://stackoverflow.com/questions/4410340/transliteration-with-iconv-in-ruby) – Phrogz Feb 23 '12 at 15:59

1 Answers1

6

This is called transliteration. An approximation of this (see examples) can be performed using the Iconv class.

Try one of the following (require 'iconv' first):

Iconv.iconv('ascii//ignore//translit', 'utf-8', string).to_s
Iconv.iconv('ascii//translit', 'utf-8', string).to_s

irb(main):013:0> Iconv.iconv('ascii//translit', 'utf-8', 'spaß').to_s
=> "spass"
irb(main):014:0> Iconv.iconv('ascii//translit', 'utf-8', 'crêpes').to_s
=> "crepes"
irb(main):017:0> Iconv.iconv('ascii//translit', 'utf-8', 'über').to_s
=> "uber"

There's also an iconv command line utility. More information on that and some Ruby examples (search for 'ruby') here.

An alternative to this is Unidecode, which I guess was inspired by the original Perl implementation. I haven't used it in its Ruby incarnation, but it should do multi-char expansions (which apparently you want) better.

Finally, if you're running Rails, you may find this thread interesting. It details some differences between alternative approaches to transliteration, and shows a way to do this within the Rails core (ActiveSupport::Inflector.transliterate)

Eduardo Ivanec
  • 11,668
  • 2
  • 39
  • 42
  • +1 Good answer; I didn't know of the term "transliteration" before, so thanks! Interesting to see that the conversion of über does not match the OP's desired outcome. Is the OP wrong to want to throw an `e` in, or is the transliteration library too language-agnostic? – Phrogz Feb 23 '12 at 15:54
  • I think it's too language-agnostic. The Unidecode distribution comes with pretty extensive data files detailing better multi-char transliterations, at least in principle. – Eduardo Ivanec Feb 23 '12 at 16:03