I would like to use Ruby 1.9.3 to replace accented UTF-8 characters with their ASCII equivalents. For example,
Acsády --> Acsady
The traditional way to do this is using the IConv package, which is part of Ruby's standard library. You can do something like this:
str = 'Acsády'
IConv.iconv('ascii//TRANSLIT', 'utf8', str)
Which will yield
Acsa'dy
One then has to delete the apostrophes. While this method still works in Ruby 1.9.3, I get a warning saying that IConv is deprecated and that String#encode should be used instead
. However, String#encode
does not offer exactly the same functionality. Undefined characters throw an exception by default, but you can handle them by either setting :undef=>:replace (which replaces undefined chars with the default '?' char) or the :fallback option to a hash which maps undefined source encoding characters to target encoding. I am wondering whether there are standard :fallback hashes available in the standard library or through some gem, such that I don't have to write my own hash to handle all possible accent marks.
@raina77ow: Thanks for the response. That's exactly what I was looking for. However, after looking at the thread you linked to I realized that a better solution may be to simply match unaccented characters to their accented equivalents, in the way that databases use a character set collation. Does Ruby have anything equivalent to collations?