I have a dataframe with columns: state, county, and agency_name, and I want to do fuzzy matching on the agency name to another dataframe that has more variables about agency names. But i want to only fuzzy match names within the same state and county.
Dataset #1 looks like this:
State County Agency_Name
FL Broward ~name1
FL Dade name2@
MN Hennepin name11
MN Hennepin name3#
Dataset #2 has names that almost match the Agency_Names in Dataset #1
State County Agency_Name Address agency_code
FL Broward name1 address1 345
FL Dade name2 address2 654
MN Hennepin name1 address3 234
MN Hennepin name3 address4 776
I can select the best fuzzy match out of all names in the dataset using this:
from fuzzywuzzy import process
import rapidfuzz
df1['agency_match'] = df1['Agency_Name'].map(lambda x: process.extractOne(x,df2['Agency_Name'], scorer=rapidfuzz.string_metric.normalized_levenshtein)[0])
However, this match doesn't work because it matches Agency_Names from different states and counties. I need the fuzzy match to pick the best match only from within the same state and county.
What would be an elegant way to do this?