I am trying to use a fuzzy matching to snap a list of responses to a validation set.
I am using the following code:
for x in rawDatabase.Status:
choice = process.extractOne(x, my_list)
print('choice ',choice)
Where the Status column in the rawDatabase
data-frame is the column I am trying to validate. my_list
is a list of standardized values for the entries in the Status
column to snap to.
Using the above code I get the following sample output:
choice ('TRANSFER IN FROM GOVERNMENT DEPARTMENT', 100, 39)
choice ('TRANSFER OUT TO GOVERNMENT DEPARTMENT', 100, 40)
choice ('CURRENT', 100, 1)
choice ('LEAVER - RETIRED', 100, 12)
choice ('CURRENT', 100, 1)
Is there a way I can return the value that best fits the string being tested and update the rawDatabase
Status column with the updated value? So for example I would get returned
choice = 'TRANSFER IN FROM GOVERNMENT DEPARTMENT'
choice = 'TRANSFER OUT TO GOVERNMENT DEPARTMENT'
choice = 'CURRENT'
choice = 'LEAVER - RETIRED'
choice = 'CURRENT'