-1

let say I have words in non-english language like

 Tomáš
 Babätká
 Vôľa

and I want to replace the non-english/no-standard characters (don't know how to call them) to closest resemblance in english language

so I would get:

 Tomas
 Babatka
 Vola

so for áǎä I would translate them to a and óôö => o

is there any Ruby gem (or maybe directly in Ruby lang) how to map non a-z characters to closest resembling character? Or is the only option for me to write entire mapping myself ?

These characters are usually used in non-english languages to represent pronunciation like in Czech or Slovak language.

but not characters like in Arabic, Chinese, Japanese or azbuka set

Reason why I need it: e.g. I want to be able to generate pretty urls like https://www.sajtka.com/category/babatka

equivalent8
  • 13,754
  • 8
  • 81
  • 109
  • Time to search Ruby gems. Not sure what you'll do with Ɔ or ẞ. – tadman Oct 29 '20 at 20:59
  • What's wrong with UTF-8 in your URLs? – tadman Oct 29 '20 at 20:59
  • sorry I wasn't specific enough, I was thinking about just characters that resembels a-z chars. So not Arabic, Chinese, Japanese or azbuka set – equivalent8 Oct 29 '20 at 21:34
  • as for UTF-8 in your URLs : it's really bad for copy pasting, (e.g. someone may post a link to a old forum with charset not suporting them and you have non-functioning url) ...it's just awkward – equivalent8 Oct 29 '20 at 21:36
  • "I was thinking about just characters that resembels a-z chars." – `ẞ` falls into that category. It is a combination of a long S and a Z, and takes on the function of "SS". "so for `áǎä` I would translate them to `a` and `óôö` => `o`" – That is wrong, though. E.g. my name should be transliterated to Joerg, not Jorg. I will not get *offended* if it is wrongly transliterated, but I *will* reconsider trusting my data to a site that cannot even process my name without butchering it. Also note that your proposed scheme will change at least one innocent word into a massively offensive one. – Jörg W Mittag Oct 30 '20 at 05:27

2 Answers2

1

You can you ActiveSupport gem (a part of Rails) for that. I18n.transliterate replaces non-ASCII characters with an ASCII approximation, or if none exists, a replacement character which defaults to “?”.

I18n.transliterate("Ærøskøbing")
# => "AEroskobing"

I18n.transliterate("日本語")
# => "???"

More info https://apidock.com/rails/ActiveSupport/Inflector/transliterate

Alexey Schepin
  • 386
  • 1
  • 13
  • Thank you, definitely a good option for Rails, but I'm looking for something lower layer (e.g in simlpe ruby scripts ) – equivalent8 Oct 29 '20 at 21:37
  • 1
    @equivalent8 you don't have to use the whole rails for this. You can simply add `gem 'i18n'` only. https://github.com/ruby-i18n/i18n/blob/a29934e07c8eb7e84ace35f45b6d77cd0bfe123c/lib/i18n.rb#L276 – Alexey Schepin Oct 29 '20 at 21:52
1

Found it ! (kind of... it would be nicer if it was not extending String)

https://github.com/fractalsoft/diacritics

String.send(:include, Diacritics::String)
"Łorem ìpsum ÐolÓr. Šit ämet".permanent #=> "lorem-ipsum-dolor-sit-aemet"
equivalent8
  • 13,754
  • 8
  • 81
  • 109