I stumbled across this post that I have been referencing: Apply fuzzy matching across a dataframe column and save results in a new column . The code I am referencing is in the answer section and uses fuzzy wuzzy and pandas. It uses fuzzy wuzzy to fund duplicate rows in 2 dataframes. I am aiming to modify this code so I can check for row duplicates in a single dataframe. Here is the code I have so far:
from fuzzywuzzy import fuzz
from fuzzywuzzy import process
import pandas as pd
import SQLAlchemy
import pyodbc
con =
sqlalchemy.create_engine('mssql+pyodbc://(localdb)\\LocalDBDemo/master?
driver=ODBC+Driver+13+for+SQL+Server')
compare = pd.read_sql_table(PIM, con)
def metrics(tup):
return pd.Series([fuzz.ratio(*tup),
fuzz.token_sort_ratio(*tup)],
['ratio', 'token'])
compare.apply(metrics)
#df1
#compare.apply(metrics).unstack().idxmax().unstack(0)
#df2
#compare.apply(metrics).unstack(0).idxmax().unstack(0)
Any help would be appreciated! I am still very much a noob so please bear with me. Thanks!