Using unique() and == to match accented vs. non-accented characters

Question

I'm putting together some tables that look almost the same, except that some characters appear accented in some and non-accented in others. For instance, "André" sometimes reads "Andre", "Flávio" and "Flavio", etc. I need to consider all variations as equal, but unique() considers them as different. I thought about changing all accented to non accented, and then using unique(), but I thought that maybe there is another, faster option.

Later I need to make the same accent-insensitive comparison using == so I'm thinking about removing all accents from a copy of each table, and do the comparison on the copies. Please tell me if there's a different, better approach.

Your approach seems appropriate. Note `iconv("André",to='ASCII//TRANSLIT') == "Andre"` — A. Webb, Aug 12 '15 at 18:58
This looks much better than converting every different possible accent, @A.Webb. I'll accept that as an answer. Thank you! — Rodrigo, Aug 12 '15 at 19:06
relevant (almost a duplicate, one answer with `stringi::stri_trans_general`) : https://stackoverflow.com/questions/13610319/convert-accented-characters-into-ascii-character — moodymudskipper, Aug 17 '17 at 08:32

score 6 · Accepted Answer · edited Aug 17 '17 at 08:23

6

The approach of removing accents prior to comparison seems appropriate for your purposes. Note that such a facility exists in iconv with the TRANSLIT flag

iconv(c("André","Flávio"),to='ASCII//TRANSLIT')
#> [1] "Andre"  "Flavio"

edited Aug 17 '17 at 08:23

moodymudskipper

46,417
11
121
167

answered Aug 12 '15 at 19:11

A. Webb

26,227
1
63
95

Yes, I made a function that makes this and converts to uppercase at the same time: ICONV <- function(x) { return(iconv(toupper(x),to='ASCII//TRANSLIT')) } Thank you! – Rodrigo Aug 12 '15 at 19:14

Using unique() and == to match accented vs. non-accented characters

1 Answers1