better alternative in letters substitution

Question

Is there any better alternative to this?

name.gsub('è','e').gsub('à','a').gsub('ò','o').gsub('ì','i').gsub('ù','u')

thanks

This sounds like a bad idea. Why would you want to remove the accents? Also, there are about 10 different ways to accent a letter in Unicode. You are only showing the grave accent. — John Gietzen, Jan 30 '10 at 14:51
See: http://en.wikipedia.org/wiki/Diacritic for a list of different marks that you may have missed. — John Gietzen, Jan 30 '10 at 14:52
Agreed with JG. Whatever it is that demands unaccented characters is at fault. — roufamatic, Jan 30 '10 at 15:02
Seems like a dupplicate of http://stackoverflow.com/questions/225471/how-do-i-replace-accented-latin-characters-in-ruby to me — James A. Rosen, Jan 30 '10 at 17:50
@John Gietzen: There are very legitimate uses for this. Try googling for "creme brulee". — Thomas, Jan 30 '10 at 19:29
@Thomas I don't see your point. Try googling for "Crème brûlée." Same results. — Ben, Jan 30 '10 at 20:56
@Ben: That is exactly my point. Those characters are treated as equal. — Thomas, Jan 30 '10 at 22:28

score 9 · Answer 1 · answered Jan 30 '10 at 15:02

9

Use tr.

Maybe like string.tr('èàòìù', 'eaoiu').

answered Jan 30 '10 at 15:02

Anonymous

49,213
1
25
19

Kaleb Brasee · Accepted Answer · 2010-01-30T15:00:11.063

1

substitutes = {'è'=>'e', 'à'=>'a', 'ò'=>'o', 'ì'=>'i', 'ù'=>'u'}
substitutes.each do |old, new| 
    name.gsub!(old, new)
end

Or you could use an extension of String such as this one to do it for you.

edited Jan 30 '10 at 15:00

answered Jan 30 '10 at 14:53

Kaleb Brasee

51,193
8
108
113

score 1 · Answer 3 · answered Jan 30 '10 at 15:13

1

If you really want a full solution, try pulling the tables from Perl's Unidecode module. After translating those tables to Ruby, you'll want to loop over each character of the input, substituting the table's value for that character.

answered Jan 30 '10 at 15:13

eswald

8,368
4
28
28

score 0 · Answer 4 · answered Jan 30 '10 at 18:33

Taking a wild stab in the dark, but if you're trying to remove the accented characters because you're using a legacy text encoding format you should look at Iconv.

An introduction which is great on the subject: http://blog.grayproductions.net/articles/encoding_conversion_with_iconv

score 0 · Answer 5 · answered Jan 30 '10 at 19:25

In case you are wondering the technical terms for what you want to do is Case Folding and possibly Unicode Normalization (and sometimes collation).

Here is a case folding configuration for ThinkingSphinx to give you an idea of how many characters you need to worry about.

score 0 · Answer 6 · edited May 23 '17 at 11:54

0

If JRuby is an option, see the answer to my question:

How do I detect unicode characters in a Java string?

It deals with removing accents from letters, using a Normalizer. You could access that class from JRuby.

edited May 23 '17 at 11:54

Community

1
1

answered Jan 30 '10 at 19:32

Geo

93,257
117
344
520

better alternative in letters substitution

6 Answers6