1

I have two data frames that have mismatching name values. Example John Johnson -> John Johnson Jr. I need to match these names up to a certain threshold. I'm using fuzzy-wuzzy, but I can't find a way to do this efficiently. I've tried iterating through both data frames like this

for index, _ in df.iterrows():
    for index_two, _ in df2.iterrows():
      if fuzzy.ratio(df.at[index, 'Name'], df2.at[index_two, 'Name']) > 85:
        df.at[index, 'value I want to add to first df'] = df2.at[index_two, 'value']

I've tried this example is it possible to do fuzzy match merge with python pandas?

I've tried this example https://www.py4u.net/discuss/162793

All three ways are extremely slow and inefficient. What am I doing wrong here?

Austin
  • 161
  • 2
  • 6
  • You could use something like: https://stackoverflow.com/questions/69344187/python-for-loop-taking-too-much-time/69349968#69349968 to find all indexes which should be mapped – maxbachmann Oct 06 '21 at 13:05

1 Answers1

1

It's best to use a dedicated library for this, please check this example: https://recordlinkage.readthedocs.io/en/latest/notebooks/link_two_dataframes.html

SultanOrazbayev
  • 14,900
  • 3
  • 16
  • 46