-1

I have a an excel file with two columns consisting of names. I need to compare the two columns(side by side) and give a fuzzy score in another column.

Any idea as how to do it?

InsaneCat
  • 2,115
  • 5
  • 21
  • 40
nehaj
  • 51
  • 1
  • 8

2 Answers2

1

You can use the fuzzywuzzy module to calculate the fuzzy score between two items on the same row and then iterate over the rows. Or if your dataset is very long this could probably be vectorized. The link below got me going with fuzzywuzzy last week: https://marcobonzanini.com/2015/02/25/fuzzy-string-matching-in-python/

Altycoder
  • 270
  • 3
  • 15
0

I've implemented the code in Python with parallel processing, which will be much faster than serial computation. Furthermore, where a fuzzy metric score exceeds a threshold, only those computations are performed in parallel. Please see the link below for the code:

https://github.com/ankitcoder123/Important-Python-Codes/blob/main/Faster%20Fuzzy%20Match%20between%20two%20columns/Fuzzy_match.py

Vesrion Compatibility:

pandas version :: 1.1.5 ,
numpy vesrion:: 1.19.5,
fuzzywuzzy version :: 1.1.0 ,
joblib version :: 0.18.0

Fuzzywuzzy metric explanation: link text

Output from code: enter image description here

Ankit Kamboj
  • 169
  • 1
  • 6