0

I have a dataframe with columns: state, county, and agency_name, and I want to do fuzzy matching on the agency name to another dataframe that has more variables about agency names. But i want to only fuzzy match names within the same state and county.
Dataset #1 looks like this:

State   County     Agency_Name 
FL      Broward    ~name1
FL      Dade       name2@
MN      Hennepin   name11 
MN      Hennepin   name3#

Dataset #2 has names that almost match the Agency_Names in Dataset #1

State   County     Agency_Name  Address   agency_code
FL      Broward    name1        address1  345
FL      Dade       name2        address2  654
MN      Hennepin   name1        address3  234
MN      Hennepin   name3        address4  776

I can select the best fuzzy match out of all names in the dataset using this:

from fuzzywuzzy import process
import rapidfuzz

df1['agency_match'] = df1['Agency_Name'].map(lambda x: process.extractOne(x,df2['Agency_Name'], scorer=rapidfuzz.string_metric.normalized_levenshtein)[0])

However, this match doesn't work because it matches Agency_Names from different states and counties. I need the fuzzy match to pick the best match only from within the same state and county.
What would be an elegant way to do this?

dave
  • 31
  • 1
  • 2

0 Answers0