remove duplicates in list in column in Pandas

Question

Pandas perhaps way out there question.

Have a dataframe like this

    Col1           Col2
['joe', 'joe']     ['joe']
['sam','bob']     ['sam'.'bob']
['mary','mary']   ['mary']

I want to use an apply function on Col1 to get the result in Col2. Meaning, I want the lists with duplicates in Col1 to no longer have those duplicates in Col2. Tried various functions with apply and set, no dice. Seems like it should be straightforward, but hold on to the laptop, it isn't. Or so it seems..

score 2 · Accepted Answer · answered Nov 12 '20 at 01:48

2

For get the col two

df['ColB'] = df['Col1'].explode().groupby(level=0).unique()

answered Nov 12 '20 at 01:48

BENY

317,841
20
164
234

None can beat explode!! +1 – Wasif Nov 12 '20 at 01:50
I mean, @BEN_YO that worked, how do I say...Uh, perfectly! Thank you. That was fantastic. I am using explode in other places in this script but did not even conceive of that use of it. Fantastic!!! – John Taylor Nov 12 '20 at 01:53
@BEN_YO One last complication. I’m running your code and is results in no duplicate lists. But, the ultimate result will be to explode one more time across multiple columns. However, that explode causes multiple duplicate values throwing off my count. Any suggestion how to avoid duplicates when exploding multiple columns? – John Taylor Nov 12 '20 at 04:42
@JohnTaylor that is hard for me to understand what you need , maybe open a new topic ? – BENY Nov 12 '20 at 15:13
You’re right. I launched a new question on this and got a great answer. Thanks again. – John Taylor Nov 12 '20 at 15:14
@JohnTaylor more info here https://stackoverflow.com/questions/53218931/how-to-unnest-explode-a-column-in-a-pandas-dataframe/53218939#53218939 – BENY Nov 12 '20 at 15:17

score 0 · Answer 2 · answered Nov 12 '20 at 01:47

How about apply list(set(x)) on the column? Cool RAW attempt ;-)

import numpy as np
import pandas as pd
df = pd.DataFrame({
    'A': [[1,2],[3,4,3],[6,7,8]]
})
df['A'] = df['A'].apply(lambda x: list(set(x)))
print(df)

Still none can beat EXPLODE!!

df['A'].explode().groupby(level=0).unique()

remove duplicates in list in column in Pandas

2 Answers2