2

Is there any better alternative to this?

name.gsub('è','e').gsub('à','a').gsub('ò','o').gsub('ì','i').gsub('ù','u')

thanks

Luca Romagnoli
  • 12,145
  • 30
  • 95
  • 157
  • 2
    This sounds like a bad idea. Why would you want to remove the accents? Also, there are about 10 different ways to accent a letter in Unicode. You are only showing the grave accent. – John Gietzen Jan 30 '10 at 14:51
  • See: http://en.wikipedia.org/wiki/Diacritic for a list of different marks that you may have missed. – John Gietzen Jan 30 '10 at 14:52
  • Agreed with JG. Whatever it is that demands unaccented characters is at fault. – roufamatic Jan 30 '10 at 15:02
  • 2
    Seems like a dupplicate of http://stackoverflow.com/questions/225471/how-do-i-replace-accented-latin-characters-in-ruby to me – James A. Rosen Jan 30 '10 at 17:50
  • @John Gietzen: There are very legitimate uses for this. Try googling for "creme brulee". – Thomas Jan 30 '10 at 19:29
  • @Thomas I don't see your point. Try googling for "Crème brûlée." Same results. – Ben Jan 30 '10 at 20:56
  • 1
    @Ben: That is exactly my point. Those characters are treated as equal. – Thomas Jan 30 '10 at 22:28

6 Answers6

9

Use tr.

Maybe like string.tr('èàòìù', 'eaoiu').

Anonymous
  • 49,213
  • 1
  • 25
  • 19
1
substitutes = {'è'=>'e', 'à'=>'a', 'ò'=>'o', 'ì'=>'i', 'ù'=>'u'}
substitutes.each do |old, new| 
    name.gsub!(old, new)
end

Or you could use an extension of String such as this one to do it for you.

Kaleb Brasee
  • 51,193
  • 8
  • 108
  • 113
1

If you really want a full solution, try pulling the tables from Perl's Unidecode module. After translating those tables to Ruby, you'll want to loop over each character of the input, substituting the table's value for that character.

eswald
  • 8,368
  • 4
  • 28
  • 28
0

Taking a wild stab in the dark, but if you're trying to remove the accented characters because you're using a legacy text encoding format you should look at Iconv.

An introduction which is great on the subject: http://blog.grayproductions.net/articles/encoding_conversion_with_iconv

gaqzi
  • 3,707
  • 3
  • 30
  • 30
0

In case you are wondering the technical terms for what you want to do is Case Folding and possibly Unicode Normalization (and sometimes collation).

Here is a case folding configuration for ThinkingSphinx to give you an idea of how many characters you need to worry about.

srboisvert
  • 12,679
  • 15
  • 63
  • 87
0

If JRuby is an option, see the answer to my question:

How do I detect unicode characters in a Java string?

It deals with removing accents from letters, using a Normalizer. You could access that class from JRuby.

Community
  • 1
  • 1
Geo
  • 93,257
  • 117
  • 344
  • 520