Time efficient way of dropping duplicates in a large dataframe of different types

Question

Say I have this dataframe:

col1 col2

'a' [1,2,3]

'b' [4,5,6]

and I want to drop the duplicates (in this case the first two rows). How would I accomplish this in a time efficient Pythonic manner (my full dataframe is millions of rows and 7 columns)

Does this answer your question? [Pandas: unique dataframe](https://stackoverflow.com/questions/12322779/pandas-unique-dataframe) — woblob, Oct 05 '20 at 17:39
lists are not hashable, so you cannot check for duplicates directly. You can convert lists to tuples and check for duplicates with Pandas as if they are numbers. That said, you would get minimal vectorization with this type of data. — Quang Hoang, Oct 05 '20 at 17:40

score 4 · Accepted Answer · answered Oct 05 '20 at 17:53

4

you can try converting to something hashable and then drop

inplace=True will overwrite your database

df["col2"] = df["col2"].transform(lambda k: tuple(k))
df.drop_duplicates(inplace=True)

answered Oct 05 '20 at 17:53

woblob

1,349
9
13

score -1 · Answer 2 · answered Oct 05 '20 at 20:21

-1

Refer Here for drop duplicates infomations and example

answered Oct 05 '20 at 20:21

Raymond Toh

779
1
8
27

Time efficient way of dropping duplicates in a large dataframe of different types

2 Answers2