How to fuzzy match text in a column and then replace with a consensus in R

Asked Mar 16 '16 at 11:33

Active Mar 16 '16 at 12:38

Viewed 352 times

I have a dataframe as follows

FName  LName  
Ayeko   Seki
Ayeko   Seki
Ayeko   Seki
Ayeko   Zeki
Aveko   Seki
Avoo    Zooki
Jacques Bergmann.
Jacques Burgman
J       Bergman
Jacques Bergmann
Jacques Bergmann
Jacques Bergmann
Jacques Bergmann
David   Goliath

J Bergman, Jacques Bergmann., Jacques Burgman and Jacques Bergmann are the same person as are the first five entries, but not the sixth or last. I would like to try to fuzzy match the names across the two columns and then replace them with a consensus (or the most common among the fuzzy matches I guess is the alternative) so that the outputted data frame should be:

FName  LName  
Ayeko   Seki
Avoo    Zooki
Jacques Bergmann
David   Goliath

I have tried using stringdist() but the issue I am having really is with a) getting the consensus match and b) then replacing the matches with the consensus

edited Mar 16 '16 at 12:38

phiver

23,048
14
44
56

asked Mar 16 '16 at 11:33

Sebastian Zeki

6,690
11
60
125

Please show how you used `stringdist` – talat Mar 16 '16 at 12:06
Have a look at [this answer](http://stackoverflow.com/questions/35904182/word2vec-for-text-mining-categories/35904557#35904557) - you can apply it 1:1 on your example. – lukeA Mar 16 '16 at 12:25
Hi lukeA. Thanks for the suggestion but this doesn't quite do it- how do I then replace the original column text values with the fuzzy matched one? – Sebastian Zeki Mar 16 '16 at 12:53

How to fuzzy match text in a column and then replace with a consensus in R

0 Answers0