3

Say I have this dataframe:

col1 col2

'a' [1,2,3]

'a' [1,2,3]

'b' [4,5,6]

and I want to drop the duplicates (in this case the first two rows). How would I accomplish this in a time efficient Pythonic manner (my full dataframe is millions of rows and 7 columns)

Hanuman95
  • 89
  • 7
  • Does this answer your question? [Pandas: unique dataframe](https://stackoverflow.com/questions/12322779/pandas-unique-dataframe) – woblob Oct 05 '20 at 17:39
  • 1
    lists are not hashable, so you cannot check for duplicates directly. You can convert lists to tuples and check for duplicates with Pandas as if they are numbers. That said, you would get minimal vectorization with this type of data. – Quang Hoang Oct 05 '20 at 17:40

2 Answers2

4

you can try converting to something hashable and then drop

inplace=True will overwrite your database

df["col2"] = df["col2"].transform(lambda k: tuple(k))
df.drop_duplicates(inplace=True)
woblob
  • 1,349
  • 9
  • 13
-1

Refer Here for drop duplicates infomations and example

Raymond Toh
  • 779
  • 1
  • 8
  • 27