2

I need to remove Latin characters like accents or "ñ" in Ruby. I tried using force_encoding('UTF-8') but it didn't work.

the Tin Man
  • 158,662
  • 42
  • 215
  • 303
jgiunta
  • 721
  • 2
  • 8
  • 28
  • 1
    Why do you want to remove them? – SLaks Apr 24 '12 at 22:04
  • What about non-Latin characters? – SLaks Apr 24 '12 at 22:04
  • 2
    Do you actually want to convert them to their non-accented ASCII equivalent? – Mark Thomas Apr 24 '12 at 22:14
  • Duplicate of [Transliteration in ruby](http://stackoverflow.com/questions/1726404/transliteration-in-ruby) or [Transliteration with Iconv in Ruby](http://stackoverflow.com/questions/4410340/transliteration-with-iconv-in-ruby) or [How Do I Replace Accented Latin Characters in Ruby](http://stackoverflow.com/questions/225471/how-do-i-replace-accented-latin-characters-in-ruby). – Phrogz Apr 24 '12 at 22:23
  • Exactly i need to convert é to e and ñ to n ! The best solution is use gsub an not forcing to explicit encode ? – jgiunta Apr 25 '12 at 12:23

1 Answers1

5

This piece of code I used in other answers about Ruby encoding proved to be effective most of the time. Make sure your script itself is saved with UTF8 coding:

t="doña"
p t.force_encoding(Encoding.locale_charmap).encode('UTF-8')
#=>"do\u251C\u2592a"

If it is replace you want instead of encode there are libraries for that but you could also use a simple regular expression

t="déjà"
puts t.gsub(/[éèàùµñçêï]/, '?') => d?j?

EDIT: i noticed in the comments you want to replace the special version of a character by the normal one, you can do this like follows

p string_with_special_chars.tr( 
"ÀÁÂÃÄÅàáâãäåĀāĂ㥹ÇçĆćĈĉĊċČčÐðĎďĐđÈÉÊËèéêëĒēĔĕĖėĘęĚěĜĝĞğĠġĢģĤĥĦħÌÍÎÏìíîïĨĩĪīĬĭĮįİıĴĵĶķĸĹĺĻļĽľĿŀŁłÑñŃńŅņŇňʼnŊŋÒÓÔÕÖØòóôõöøŌōŎŏŐőŔŕŖŗŘřŚśŜŝŞşŠšſŢţŤťŦŧÙÚÛÜùúûüŨũŪūŬŭŮůŰűŲųŴŵÝýÿŶŷŸŹźŻżŽž", 
"AAAAAAaaaaaaAaAaAaCcCcCcCcCcDdDdDdEEEEeeeeEeEeEeEeEeGgGgGgGgHhHhIIIIiiiiIiIiIiIiIiJjKkkLlLlLlLlLlNnNnNnNnnNnOOOOOOooooooOoOoOoRrRrRrSsSsSsSssTtTtTtUUUUuuuuUuUuUuUuUuUuWwYyyYyYZzZzZz")
peter
  • 41,770
  • 5
  • 64
  • 108