5

As we know we can use Iconv in Ruby 1.9.3 with TRANSLIT flag which will replace accented characters with their ASCII equivalents, only if they're not present in destination encoding.

Example of use:

require 'iconv'
z = "Håkan"
Iconv.conv("windows-1250//TRANSLIT", "UTF-8", z) 
# => outputs "Hakan" (with diactric removed)
pl = "zażółć"
Iconv.conv("windows-1250//TRANSLIT", "UTF-8", pl)
# => outputs "zażółć" (because windows-1250 contains all this characters)
# well, to be honest it outputs "za\xBF\xF3\xB3\xE6" because of terminal settings
# but I hope you understand

However Iconv is deprecated and it's recommended to use String#encode instead.

However when using #encode the problem arises:

z.encode('windows-1250', 'utf-8')
Encoding::UndefinedConversionError: U+00E5 to WINDOWS-1250 in conversion from UTF-8 to WINDOWS-1250

Is there any way to get behavior similar to one with iconv TRANSLIT flag using String#encode instead in Ruby 2+?

Esse
  • 3,278
  • 2
  • 21
  • 25
  • hm, your example gives me `invalid multibyte char (US-ASCII) syntax error, unexpected $end, expecting ')'` in Ruby 1.9.3-p448 – Rustam Gasanov Feb 16 '15 at 11:48
  • You've identified there's a difference between the two methods but haven't shared what the difference is. What's the desired output of your conversion? – Anthony Feb 16 '15 at 11:49
  • @RustamA.Gasanov - I've made example simpler, maybe now it will be ok. – Esse Feb 16 '15 at 12:01
  • @Anthony - I've added information about desired result (and difference) – Esse Feb 16 '15 at 12:02
  • possible duplicate of [Transliteration in ruby](http://stackoverflow.com/questions/1726404/transliteration-in-ruby) – SztupY Feb 16 '15 at 12:03
  • @SztupY not duplicate, this question is about using `String#encode`, that question sticks with `iconv` or external gems... – Esse Feb 16 '15 at 12:04
  • Not a complete equivalent, but you can use this to remove accents: http://stackoverflow.com/questions/1726404/transliteration-in-ruby/16179878#16179878 – SztupY Feb 16 '15 at 12:05
  • @Esse: the question is about how to do something that is deprecated in ruby 2.0. If you can only do it via external gems, or workarounds (as iconv has it's own logic to do transliteration, so not using iconv means you have to do the logic somewhere else), then that's the answer. – SztupY Feb 16 '15 at 12:06
  • @SztupY: I've added some clarification to question (which actually makes `UnicodeUtils` pretty useless to me) – Esse Feb 16 '15 at 12:11

1 Answers1

-1

If you know what to expect, then you can specify the replacements in a hash:

z = "Håkan"
p z.encode('windows-1250', 'utf-8', fallback: {"å"=>"a"}) # => Hakan
steenslag
  • 79,051
  • 16
  • 138
  • 171
  • Unfortunately it depends - both on input and on destination encoding, so it isn't acceptable solution :( – Esse Feb 17 '15 at 10:56