Remove all the duplicated values for each list that is into de DataFrame column

Question

I have the following dataframe, and i want to remove all the duplicated values for each list that is into de DataFrame column num_ent.

I would like that the return value will be the column num_ent but without repeated values for each list.

import pandas as pd

data = {'id': [287, 3345, 3967, 7083, 23607], 'num_ent': [[0, 1, 1, 2, 3, 4, 3, 4, 5, 6, 7], [1, 2, 3, 4, 5, 6, 7, 8], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13], [0, 6, 7, 8, 9, 10, 10, 10, 11, 12, 13, 14, 15]]}

df = pd.DataFrame(data=data)

Starting DF

      id                                          num_ent
0    287                [0, 1, 1, 2, 3, 4, 3, 4, 5, 6, 7]
1   3345                         [1, 2, 3, 4, 5, 6, 7, 8]
2   3967   [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]
3   7083   [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]
4  23607  [0, 6, 7, 8, 9, 10, 10, 10, 11, 12, 13, 14, 15]

Does this answer your question? [Drop duplicate list elements in column of lists](https://stackoverflow.com/questions/62872266/drop-duplicate-list-elements-in-column-of-lists) — AlexK, May 13 '21 at 06:03

score 1 · Answer 1 · answered May 13 '21 at 00:55

One issue might be that you are importing the column "num_ent" as strings instead of lists. One potential solution is:

df=pd.read_csv("test.txt", delimiter=";")
df
      id                    num_ent
0    287  [0,1,1,1,2,3,3,4,3,5,6,7]
1   3345        [1,2,3,4,5,6,7,8,9]
2   3967            [0,1,2,3,4,5,6]
3  23607    [0,6,7,8,9,10,10,10,11]

df["num_ent"] = df["num_ent"].apply(eval)
df["num_ent"] = df["num_ent"].map(pd.unique)
df
      id                      num_ent
0    287     [0, 1, 2, 3, 4, 5, 6, 7]
1   3345  [1, 2, 3, 4, 5, 6, 7, 8, 9]
2   3967        [0, 1, 2, 3, 4, 5, 6]
3  23607      [0, 6, 7, 8, 9, 10, 11]

score 0 · Answer 2 · answered May 13 '21 at 00:18

A simple solution to this is to cast your list as a set, then back to a list.

df['num_ent'] = df.apply(lambda x: list(set(x['num_ent'])), axis=1)

Output

      id                                         num_ent
0    287                        [0, 1, 2, 3, 4, 5, 6, 7]
1   3345                        [1, 2, 3, 4, 5, 6, 7, 8]
2   3967  [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]
3   7083  [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]
4  23607         [0, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]

score 0 · Answer 3 · answered May 13 '21 at 00:22

Pass the list through a set, return a new list

df = pd.DataFrame({'id':[1,2], 'num_ent': [[1,1,2,3,4,4,5], [2,6,4,4,4,7,8]]})


   id                num_ent
0   1  [1, 1, 2, 3, 4, 4, 5]
1   2  [2, 6, 4, 4, 4, 7, 8]


df.num_ent = df.num_ent.apply(lambda x: list(set(x)))

   id          num_ent
0   1  [1, 2, 3, 4, 5]
1   2  [2, 4, 6, 7, 8]

Remove all the duplicated values for each list that is into de DataFrame column

3 Answers3