Remove duplicate items from lists in Python lists

Question

I want to remove duplicate items from lists in sublists on Python.

Exemple :

myList = [[1,2,3], [4,5,6,3], [7,8,9], [0,2,4]]

to

myList = [[1,2,3], [4,5,6], [7,8,9], [0]]

I tried with this code :

myList = [[1,2,3],[4,5,6,3],[7,8,9], [0,2,4]]
 
nbr = []

for x in myList:
    for i in x:     
        if i not in nbr:
            nbr.append(i)
        else:
            x.remove(i)

But some duplicate items are not deleted.

Like this : [[1, 2, 3], [4, 5, 6], [7, 8, 9], [0, 4]]

I still have the number 4 that repeats.

try not to modify a list you are also iterating over, try `for i in x.copy():` — Matiiss, Mar 20 '22 at 07:18
as @Matiss said. you are basically iterating over an actual list. Use copy() to iterate over a copy of list and delete from actual. Add print() before append & remove to actually see the results. — Ali Jibran, Mar 20 '22 at 07:24

Matiiss · Accepted Answer · 2022-03-20T07:31:23.637

5

You iterate over a list that you are also modifying:

...
    for i in x:
        ...
        x.remove(i)

That means that it may skip an element on next iteration.

The solution is to create a shallow copy of the list and iterate over that while modifying the original list:

...
    for i in x.copy():
        ...
        x.remove(i)

edited Mar 20 '22 at 07:31

answered Mar 20 '22 at 07:22

Matiiss

5,970
2
12
29

I literally just ended debugging on the OP's code, I was going to give the same explaination so a +1 is due – FLAK-ZOSO Mar 20 '22 at 07:23

kcsquared · Answer 2 · 2022-03-20T08:18:20.460

5

You can make this much faster by:

Using a set for repeated membership testing instead of a list, and
Rebuilding each sublist rather than repeatedly calling list.remove() (a linear-time operation, each time) in a loop.

seen = set()

for i, sublist in enumerate(myList):
    new_list = []

    for x in sublist:
        if x not in seen:
            seen.add(x)
            new_list.append(x)

    myList[i] = new_list

>>> print(myList)
[[1, 2, 3], [4, 5, 6], [7, 8, 9], [0]]

If you want mild speed gains and moderate readability loss, you can also write this as:

seen = set()

for i, sublist in enumerate(myList):
    myList[i] = [x for x in sublist if not (x in seen or seen.add(x))]

edited Mar 20 '22 at 08:18

answered Mar 20 '22 at 07:32

kcsquared

5,244
1
11
36

1

The two points perfectly explain why this should be the way to go. It might be good though to show how this would be done without the `x in seen or seen.add(x)` 'trick'. – Thierry Lathuille Mar 20 '22 at 07:36
@ThierryLathuille Added, thanks for the feedback. I haven't measured the performance, but it's probably almost the same, so not much reason to use the trick. The new version is also about 10x clearer, IMO. – kcsquared Mar 20 '22 at 07:57

score 0 · Answer 3 · answered Mar 20 '22 at 08:00

Why you got wrong answer: In your code, after scanning the first 3 sublists, nbr = [1, 2, 3, 4, 5, 6, 7, 8, 9]. Now x = [0, 2, 4]. Duplicate is detected when i = x[1], so x = [0, 4]. Now i move to x[2] which stops the for loop.

Optimization has been proposed in other answers. Generally, 'list' is only good for retrieving element and appending/removing at the rear.

Remove duplicate items from lists in Python lists

3 Answers3

Linked