0

In my pandas data frame, I have a column where each row of the column is a list with repeated values. For example - A data frame with 3 rows: df = pd.DataFrame({'Column_1': [[1,2,3,2],[1,1,2],[1,2,3]]}) I want to remove the duplicates. My expected output is something like [[1,2,3],[1,2],[1,2,3]]. How can I apply a set function to remove the duplicates in each of the lists?

Thanks in advance!

ipj
  • 3,488
  • 1
  • 14
  • 18
  • What you are searching is to remove duplicates in a list. Refer here. https://stackoverflow.com/questions/7961363/removing-duplicates-in-lists – Jack Song Aug 03 '20 at 14:34
  • Right but I want to apply a set function to the entries of dataframe's column directly. I am looking for an efficient way to do this in pandas. Thanks :) – Soumya Ranjan Sahoo Aug 03 '20 at 14:39

1 Answers1

0

Given df:

import pandas as pd
import numpy as np
df = pd.DataFrame({'Column_1': [[1,2,3,2],[1,1,2],[1,2,3]]})

Try:

df.Column_1 = df.Column_1.apply(lambda r : list(set(r))) 

or:

df.Column_1 = df.Column_1.apply(np.unique)

result:

    Column_1
0  [1, 2, 3]
1     [1, 2]
2  [1, 2, 3]
ipj
  • 3,488
  • 1
  • 14
  • 18