Basic question - iterating through pandas dataframe column using a function

Question

I am struggling with the basics. I have just one column with names in pandas dataframe and I want to compare strings for potential duplicates using 3-4 functions from fuzzywuzzy library. So first name I want to check against the rest of the column content, then 2nd name and so on. Column will have hundreds if not thousands of names. I want to create a df with combination of names for which at least one of the values is above 80.

Do I need to create a list out of that df? Apologies, I know it is very basic I just can't seem to find a solution myself.

Does this answer your question? [Pandas fuzzy detect duplicates](https://stackoverflow.com/questions/39490190/pandas-fuzzy-detect-duplicates) — johannesack, Mar 01 '20 at 14:10
Hi @cnns, welcome to SO! Please try to provide a reproducible example for your question (see [here](https://stackoverflow.com/help/minimal-reproducible-example)). — jkd, Mar 01 '20 at 14:12

score 0 · Accepted Answer · answered Mar 24 '20 at 09:44

0

So in the end I found a different approach to my issue. Instead of doing 80k vs 80k list I have used a function called itertools.combinations which gives you unique combinations which is perfect in this scenario.

answered Mar 24 '20 at 09:44

cnns

151
7

Basic question - iterating through pandas dataframe column using a function

1 Answers1