Using fuzzyWuzzy to efficiently join two pandas dataframes on Name value

Question

I have two data frames that have mismatching name values. Example John Johnson -> John Johnson Jr. I need to match these names up to a certain threshold. I'm using fuzzy-wuzzy, but I can't find a way to do this efficiently. I've tried iterating through both data frames like this

for index, _ in df.iterrows():
    for index_two, _ in df2.iterrows():
      if fuzzy.ratio(df.at[index, 'Name'], df2.at[index_two, 'Name']) > 85:
        df.at[index, 'value I want to add to first df'] = df2.at[index_two, 'value']

I've tried this example is it possible to do fuzzy match merge with python pandas?

I've tried this example https://www.py4u.net/discuss/162793

All three ways are extremely slow and inefficient. What am I doing wrong here?

You could use something like: https://stackoverflow.com/questions/69344187/python-for-loop-taking-too-much-time/69349968#69349968 to find all indexes which should be mapped — maxbachmann, Oct 06 '21 at 13:05

score 1 · Accepted Answer · answered Oct 05 '21 at 15:34

1

It's best to use a dedicated library for this, please check this example: https://recordlinkage.readthedocs.io/en/latest/notebooks/link_two_dataframes.html

answered Oct 05 '21 at 15:34

SultanOrazbayev

14,900
3
16
46

Using fuzzyWuzzy to efficiently join two pandas dataframes on Name value

1 Answers1