Correcting several wrong names in a data.frame in R (approach recommendation)

Question

I have a file with around 260 customers, but because the name of the same customer is spelled in many different ways, as in the following example:

Cesar Fereira
Cesar Féreira   
César Fereira   
Cezar Fereira

because of this I have about 1000 different names, so I would like a recommendation approach on how to proceed with name correction in chain, for all customers using R package or a kind of approach.

various distances are used in such cases, but I am not sure if there is a way tp actually do the trick without the user to check the results. I would suggest something like stringdist package. — NpT, Oct 25 '19 at 13:13
[Relevant](https://stackoverflow.com/questions/6044112/how-to-measure-similarity-between-strings) — Sotos, Oct 25 '19 at 13:16

Janhoo · Accepted Answer · 2019-10-25T14:18:03.727

2

If you are dealing not only with accents, but alternative letters agrep might be a solution.

d <- c("Cesar Fereira", "Cesar Féreira", "César Fereira ", "Cezar Fereira")
lapply(d,function(x){agrep(x,d,max.distance = 0.1, ignore.case = T, value = T)})

EDIT expanding on Parfait's proposal you could

library(dplyr)
d <- c("Cesar Fereira", "Cesar Féreira", "César Fereira ", "Cezar Fereira", "Zebra", "Zébra")
expand.grid(d,d) %>% mutate(same = agrepl(Var1,Var2,max.distance = 7))

playing around with max.distance, it seems that the selectivity is not very good as you can see. Bummer.

edited Oct 25 '19 at 14:18

answered Oct 25 '19 at 13:28

Janhoo

597
5
21

1

You can even use [`agrepl`](https://www.rdocumentation.org/packages/base/versions/3.6.1/topics/agrep) inside `ifelse` for OP's need of *correction*: `clean_d <- ifelse(agrepl("Cesar Fereira", d, max.distance = 0.1, ignore.case = TRUE), "Cesar Fereira", d)` – Parfait Oct 25 '19 at 13:46
@Janhoo, I wanted a recommendation and I found a solution, thanks. =D – Curious G. Oct 25 '19 at 15:13

Correcting several wrong names in a data.frame in R (approach recommendation)

1 Answers1