Matching strings in a pandas dataframe using fuzzywuzzy

Question

I have two dataframes: Instructor_Info and Operator_Info

Instructor_Info contains a column called Names and OperatorName, and Operator_Info also has a column called Names. All names in Instructor_Info have an associated name in Operator Info. I want to use fuzz.token_sort_ratio() to find these matches by comparing each name in Instructor_Info to every name in Operator_Info and storing the string with the highest score in the OperatorName column.

This is what I have so far:

for index, row in Instructor_Info.iterrows():
    match = 0
    for index1,row1 in Operator_Info.iterrows():
        if fuzz.token_sort_ratio(row['Names'],row1['Names']) > match:
            row['OperatorName'] = row1['Names']

This code runs extremely slow and gets a couple of false matches (I can fix these manually so speed is the main issue). If anyone has any faster ideas it would be much appreciated. Thanks in advance.

[I suggest using difflib instead, it's much faster.](https://stackoverflow.com/questions/56521625/quicker-way-to-perform-fuzzy-string-match-in-pandas/56521804#56521804) — cs95, Jun 13 '19 at 19:01
I second the use of difflib. Here is [another example](https://stackoverflow.com/a/13680953/3639023) of it's usage with pandas — johnchase, Jun 13 '19 at 19:03

Matching strings in a pandas dataframe using fuzzywuzzy

0 Answers0