0

I have a dataframe as follows

FName  LName  
Ayeko   Seki
Ayeko   Seki
Ayeko   Seki
Ayeko   Zeki
Aveko   Seki
Avoo    Zooki
Jacques Bergmann.
Jacques Burgman
J       Bergman
Jacques Bergmann
Jacques Bergmann
Jacques Bergmann
Jacques Bergmann
David   Goliath

J Bergman, Jacques Bergmann., Jacques Burgman and Jacques Bergmann are the same person as are the first five entries, but not the sixth or last. I would like to try to fuzzy match the names across the two columns and then replace them with a consensus (or the most common among the fuzzy matches I guess is the alternative) so that the outputted data frame should be:

FName  LName  
Ayeko   Seki
Avoo    Zooki
Jacques Bergmann
David   Goliath

I have tried using stringdist() but the issue I am having really is with a) getting the consensus match and b) then replacing the matches with the consensus

phiver
  • 23,048
  • 14
  • 44
  • 56
Sebastian Zeki
  • 6,690
  • 11
  • 60
  • 125
  • Please show how you used `stringdist` – talat Mar 16 '16 at 12:06
  • Have a look at [this answer](http://stackoverflow.com/questions/35904182/word2vec-for-text-mining-categories/35904557#35904557) - you can apply it 1:1 on your example. – lukeA Mar 16 '16 at 12:25
  • Hi lukeA. Thanks for the suggestion but this doesn't quite do it- how do I then replace the original column text values with the fuzzy matched one? – Sebastian Zeki Mar 16 '16 at 12:53

0 Answers0