0

I have two excel sheets with insurance claims data from two different insurance providers. I need to find cases of individuals that have filed claims under both providers.

I would like to have something that pairs names if it seems likely that they are the same name, but does nothing if it doesn't find a similar enough name in the other sheet. From what I have read I think I need to use fuzzy strings for this (and maybe the DL distance). I know R has a string distance function, adist, but I am struggling to learn to use it properly.

For an example:

Provider 1:
Ms. Smith        35        F        Portland,OR             Cardiac
Adam Jacobs      27        M        San Francisco, CA       Gynecology
Emily Lo         19        F        Portland,OR             Ortho
Frances Wu       33        F        Dallas, TX              ENT

Provider 2: 
Clara Smith      35        F        Portland,OR              Cardiac
Bill White       29        M        San Francisco, CA        Ortho
Emily S. Lo      19        F        Portland,OR              Ortho
Dev Patel        22        M        Dallas, TX               Neuro

So here it should recognize that Emily S. Lo is the same person as Emily Lo, and that Clara Smith is the same as Ms.Smith and give me a list with their names and information. How do I do this?

I tried copying what this person did: http://bigdata-doctor.com/fuzzy-string-matching-survival-skill-tackle-unstructured-information-r/ I tried with their data, copy/pasting their code and I keep getting a 0x0 result.

Amie
  • 103
  • 12
  • 1
    Please read the info about [how to ask a good question](http://stackoverflow.com/help/how-to-ask) and how to give a [reproducible example](http://stackoverflow.com/questions/5963269). This will make it much easier for others to help you. – zx8754 Dec 06 '16 at 11:08
  • Please read the link in the comment above and make your examples reproducible. – Sotos Dec 06 '16 at 13:47

0 Answers0