Following this answer I am also trying to obtain the string match score between two lists. What would be the best way of doing that?
elements = pd.DataFrame({'name':['vikash', 'vikas', 'Vinod', 'Vikky', 'Akash', 'Vinodh', 'Sachin', 'Salman', 'Ajay', 'Suchin', 'Akash', 'vikahs']})
elements2= pd.DataFrame({'name': ['Ajay1', 'Suchin', 'Akassh', 'vikahs','vikash', 'vikash', 'Vinodh', 'Viky', 'Akash', 'Vinodh', 'Sachin', 'Salman','saman','Vikky']})
What I have tried so far:
from rapidfuzz.process import cdist
# Calculate distance between all the names
sa = cdist(elements, elements2, score_cutoff=90, workers=-1)
duplicates_list = []
score_list = []
for distances in sa:
# Get indices of duplicates
indices = np.argwhere(~np.isin(distances, [100, 0])).flatten()
# Get names from indices
names = list(map(elements2.__getitem__, indices))
duplicates_list.append(names)
# Get scores
condition = np.where(distances>0)
score = np.extract(condition, distances)
score_list.append(score)
# Create dataframe using the data
df = pd.DataFrame({'name': elements, 'duplicates': duplicates_list, 'score': score_list})
I am trying to obtain the string match and also the score.