2

I am struggling with the basics. I have just one column with names in pandas dataframe and I want to compare strings for potential duplicates using 3-4 functions from fuzzywuzzy library. So first name I want to check against the rest of the column content, then 2nd name and so on. Column will have hundreds if not thousands of names. I want to create a df with combination of names for which at least one of the values is above 80.

Do I need to create a list out of that df? Apologies, I know it is very basic I just can't seem to find a solution myself.

cnns
  • 151
  • 7
  • Does this answer your question? [Pandas fuzzy detect duplicates](https://stackoverflow.com/questions/39490190/pandas-fuzzy-detect-duplicates) – johannesack Mar 01 '20 at 14:10
  • Hi @cnns, welcome to SO! Please try to provide a reproducible example for your question (see [here](https://stackoverflow.com/help/minimal-reproducible-example)). – jkd Mar 01 '20 at 14:12

1 Answers1

0

So in the end I found a different approach to my issue. Instead of doing 80k vs 80k list I have used a function called itertools.combinations which gives you unique combinations which is perfect in this scenario.

cnns
  • 151
  • 7