I have two dataframes with numerous variables. Of primary concern are the following variables, df1.organization_name and df2.legal.name. I'm just using fully qualified SQL-esque names here.
df1 has dimensions of 15 x 2700 whereas df2 has dimensions of 10x40,000. And essentially, the 'common' or 'matching' columns are name fields.
I reviewed this post Merging through fuzzy matching of variables in R and it was very helpful but I can't really figure out how to wrangle the script to get it to work with my dfs.
I keep getting an error - Error in which(organization_name[i] == LEGAL.NAME) : object 'LEGAL.NAME' not found.
Desired Matching and Outcome
What I am trying to do is compare each and every one of my df1.organization_name to every one of the df2.legal_name and make a comparison if they are a very close match (like >=85%). And then like in the script above, take matched customer name and the matched comparison name and put them into a data.frame for later analysis.
So, if one of my customer names is 'Johns Hopkins Auto Repair' and one of my public list names is, 'John Hopkins Microphone Repair', I would call that a good match and I want some sort of indicator appended to my customer list (in another column) that says, 'Partial Match' and the name from the public list.
Example(s) of the dfs for text wrangling:
df1.organization_name (these are fake names b/c I can't post customer names)
- My Company LLC
- John Johns DBA John's Repair
- Some Company Inc
- Ninja Turtles LLP
- Shredder Partners
df2.LEGAL.NAME (these are real names from the open source file)
- $1 & UP STORE CORP.
- $1 store 0713
- LLC 0baid/munir/gazem
- 1 2 3 MONEY EXCHANGE LLC
- 1 BOY & 3 GIRLS, LLC
- 1 STAR BEVERAGE INC
- 1 STOP LLC
- 1 STOP LLC
- 1 STOP LLC DBA TIENDA MEXICANA LA SAN JOSE
- 1 Stop Money Centers, LLC/Richard