I have a pandas dataframe like so:
State City Population Year
ALABAMA Alexander City 14947 2020
ALABAMA Owens Cross Roads 9972 2020
ARIZONA Miami 1863 2020
....
and another dataframe like so:
State City val Year
ALABAMA Alexander city 55.60 2020
ALABAMA Owens Crossroads 22.09 2020
ARIZONA Miami 45.23 2020
....
Now, I would like to merge these dataframes by full match of State
& Year
, & partial match of City
as the city names are not standardized. Any similarity algorithm will do (soundex, Levenshtein, difflib's).
And finally get something like this:
State City val Year Population
ALABAMA Alexander city 55.60 2020 14947
ALABAMA Owens Crossroads 22.09 2020 9972
ARIZONA Miami 45.23 2020 1863
....
EDIT:
This is not the same as some other ques on SO (like: is it possible to do fuzzy match merge with python pandas?) as I want to merge on multiple columns, some exact match and one best partial match.