1

I have two dataframes: Instructor_Info and Operator_Info

Instructor_Info contains a column called Names and OperatorName, and Operator_Info also has a column called Names. All names in Instructor_Info have an associated name in Operator Info. I want to use fuzz.token_sort_ratio() to find these matches by comparing each name in Instructor_Info to every name in Operator_Info and storing the string with the highest score in the OperatorName column.

This is what I have so far:

for index, row in Instructor_Info.iterrows():
    match = 0
    for index1,row1 in Operator_Info.iterrows():
        if fuzz.token_sort_ratio(row['Names'],row1['Names']) > match:
            row['OperatorName'] = row1['Names']

This code runs extremely slow and gets a couple of false matches (I can fix these manually so speed is the main issue). If anyone has any faster ideas it would be much appreciated. Thanks in advance.

DataScience99
  • 339
  • 3
  • 10
  • 1
    [I suggest using difflib instead, it's much faster.](https://stackoverflow.com/questions/56521625/quicker-way-to-perform-fuzzy-string-match-in-pandas/56521804#56521804) – cs95 Jun 13 '19 at 19:01
  • 2
    I second the use of difflib. Here is [another example](https://stackoverflow.com/a/13680953/3639023) of it's usage with pandas – johnchase Jun 13 '19 at 19:03

0 Answers0