I have 2 dataframes of restaurant information to merge.
df1 = pd.DataFrame ({'Restaurant_Name': ['Apple', 'Banana', 'Orange', 'apple','apple1'],
'Postal Code': [12345, 12345, 54321, 54321,1111]})
df2 = pd.DataFrame ({'Restaurant_Name': ['apple', 'apple', 'Banana'],
'Postal Code': [12345, 54321, 12345],
'Phone':[100,200,300]})
- Each restaurant has a postal code (not unique, 2 restaurants can be located in the same place). So I cannot merge the dataframes based on postal code.
- But restaurants sharing the same postal code can be distinguished by their names.
- Restaurant names are spelt slightly different so I cannot merge based on restaurant names too
Ideally I want to produce a table that looks like this:
I tried to match the restaurant names based on fuzzy matching followed by a match of postal code, but was not able to get a very accurate result. I also tried to concatenate the restaurant name with postal code for each of the dataframe and do a fuzzy matching of the concatenated result but I don't think this is the best way.
Is there any way to achieve 100% accuracy in matching the two dataframes?