1

I want to remove duplicate items from lists in sublists on Python.

Exemple :

  • myList = [[1,2,3], [4,5,6,3], [7,8,9], [0,2,4]]

to

  • myList = [[1,2,3], [4,5,6], [7,8,9], [0]]

I tried with this code :

myList = [[1,2,3],[4,5,6,3],[7,8,9], [0,2,4]]
 
nbr = []

for x in myList:
    for i in x:     
        if i not in nbr:
            nbr.append(i)
        else:
            x.remove(i)
    

But some duplicate items are not deleted.

Like this : [[1, 2, 3], [4, 5, 6], [7, 8, 9], [0, 4]]

I still have the number 4 that repeats.

Cabri
  • 13
  • 5
  • 2
    try not to modify a list you are also iterating over, try `for i in x.copy():` – Matiiss Mar 20 '22 at 07:18
  • as @Matiss said. you are basically iterating over an actual list. Use copy() to iterate over a copy of list and delete from actual. Add print() before append & remove to actually see the results. – Ali Jibran Mar 20 '22 at 07:24

3 Answers3

5

You iterate over a list that you are also modifying:

...
    for i in x:
        ...
        x.remove(i)

That means that it may skip an element on next iteration.

The solution is to create a shallow copy of the list and iterate over that while modifying the original list:

...
    for i in x.copy():
        ...
        x.remove(i)
Matiiss
  • 5,970
  • 2
  • 12
  • 29
  • I literally just ended debugging on the OP's code, I was going to give the same explaination so a +1 is due – FLAK-ZOSO Mar 20 '22 at 07:23
5

You can make this much faster by:

  1. Using a set for repeated membership testing instead of a list, and
  2. Rebuilding each sublist rather than repeatedly calling list.remove() (a linear-time operation, each time) in a loop.
seen = set()

for i, sublist in enumerate(myList):
    new_list = []

    for x in sublist:
        if x not in seen:
            seen.add(x)
            new_list.append(x)

    myList[i] = new_list
>>> print(myList)
[[1, 2, 3], [4, 5, 6], [7, 8, 9], [0]]

If you want mild speed gains and moderate readability loss, you can also write this as:

seen = set()

for i, sublist in enumerate(myList):
    myList[i] = [x for x in sublist if not (x in seen or seen.add(x))]
kcsquared
  • 5,244
  • 1
  • 11
  • 36
  • 1
    The two points perfectly explain why this should be the way to go. It might be good though to show how this would be done without the `x in seen or seen.add(x)` 'trick'. – Thierry Lathuille Mar 20 '22 at 07:36
  • @ThierryLathuille Added, thanks for the feedback. I haven't measured the performance, but it's probably almost the same, so not much reason to use the trick. The new version is also about 10x clearer, IMO. – kcsquared Mar 20 '22 at 07:57
0

Why you got wrong answer: In your code, after scanning the first 3 sublists, nbr = [1, 2, 3, 4, 5, 6, 7, 8, 9]. Now x = [0, 2, 4]. Duplicate is detected when i = x[1], so x = [0, 4]. Now i move to x[2] which stops the for loop.

Optimization has been proposed in other answers. Generally, 'list' is only good for retrieving element and appending/removing at the rear.

yzhang
  • 122
  • 3