1

I am trying to use a fuzzy matching to snap a list of responses to a validation set.

I am using the following code:

for x in rawDatabase.Status:
        choice = process.extractOne(x, my_list)
        print('choice ',choice)

Where the Status column in the rawDatabase data-frame is the column I am trying to validate. my_list is a list of standardized values for the entries in the Status column to snap to.

Using the above code I get the following sample output:

choice  ('TRANSFER IN FROM GOVERNMENT DEPARTMENT', 100, 39)
choice  ('TRANSFER OUT TO GOVERNMENT DEPARTMENT', 100, 40)
choice  ('CURRENT', 100, 1)
choice  ('LEAVER - RETIRED', 100, 12)
choice  ('CURRENT', 100, 1)

Is there a way I can return the value that best fits the string being tested and update the rawDatabase Status column with the updated value? So for example I would get returned

choice = 'TRANSFER IN FROM GOVERNMENT DEPARTMENT'
choice = 'TRANSFER OUT TO GOVERNMENT DEPARTMENT'
choice = 'CURRENT'
choice = 'LEAVER - RETIRED'
choice = 'CURRENT'
Stacey
  • 4,825
  • 17
  • 58
  • 99
  • Possible duplicate of [Fuzzy string comparison in Python, confused with which library to use](https://stackoverflow.com/questions/6690739/fuzzy-string-comparison-in-python-confused-with-which-library-to-use) – Jan Oct 18 '17 at 15:13
  • Use the `Levenshtein` distance or `difflib`. – Jan Oct 18 '17 at 15:13

1 Answers1

2

Modify you code

l1=[]
for x in rawDatabase.Status:
        choice = process.extractOne(x, my_list)[0]
        l1.append(choice)
rawDatabase['choice']=l1

More example :

from fuzzywuzzy import fuzz
from fuzzywuzzy import process
a=[]
for x in df.response:
    a.append([process.extract(x, val.validate, limit=1)][0][0][0])
df['response2']=a
df

Out[867]: 
   id  colour response response2
0   1    blue   curent   current
1   2     red  loaning      loan
2   3  yellow  current   current
3   4   green     loan      loan
4   5     red  currret   current
5   6   green     loan      loan

Input data:

df:

id colour  response
 1   blue    curent 
 2    red   loaning
 3 yellow   current
 4  green      loan 
 5    red   currret
 6  green      loan

Val:

validate
 current
    loan
transfer
BENY
  • 317,841
  • 20
  • 164
  • 234