I have the next DataFrame(df) in pandas: (This is just an example the real DF is more than 2000 rows and more than 20 names)
ID | Name |
---|---|
1 | Andrea Gonzlez |
2 | Andrea Glz |
3 | Andrea Glez |
4 | Lineth Arce |
5 | lineth a |
6 | lineth aerc |
I want to compare row 1 name with row 2 name and if they are >80% ratio, then row 2 gets changed to name in row 1. So in the end i will have a column where i only have different names of each one.
What I did is i created a list with the names = ['Andrea Glz', 'Lineth Arce'] and then create a function:
def compare(x):
for i in names:
ratio = fuzz.token_set_ratio(i,x)
if ratio > 80:
return i
Then use the next code and rewrite the column with the matched result from the names list:
df['Name'] = df['Name'].apply(compare)
I get the desired result but takes a lot of processing time. Is there an easier and faster way of doing this?
Desired result table:
ID | Name |
---|---|
1 | Andrea Gonzlez |
2 | Andrea Gonzlez |
3 | Andrea Gonzlez |
4 | Lineth Arce |
5 | Lineth Arce |
6 | Lineth Arce |