I have the following dataframe:
df = pd.DataFrame(
{'id': [1, 2, 3, 4, 5, 6],
'fruits': ['apple', 'apples', 'orange', 'apple tree', 'oranges', 'mango']
})
id fruits
0 1 apple
1 2 apples
2 3 orange
3 4 apple tree
4 5 oranges
5 6 mango
I hope to find fuzzy strings in column fruits
and get a new dataframe as follows, which ratio_score is higher than 80.
How could do that in Python using fuzzywuzzy packages? Thanks. Please note ratio_score
are a serie of values made-up as example.
My solution:
df.loc[:,'fruits_copy'] = df['fruits']
df['ratio_score'] = df[['fruits', 'fruits_copy']].apply(lambda row: fuzz.ratio(row['fruits'], row['fruits_copy']), axis=1)
Expected result:
id fruits matched_id matched_fruits ratio_score
0 1 apple 2 apples 95
1 1 apple 4 apple tree 85
2 2 apples 4 apple tree 80
3 3 orange 5 oranges 95
4 6 mango
Reference related:
Fuzzy matching a sorted column with itself using python
Apply fuzzy matching across a dataframe column and save results in a new column
How do I fuzzy match items in a column of an array in python?
Using fuzzywuzzy to create a column of matched results in the data frame