Duplicate removal from a list in python

Question

I am trying to remove duplicates from list. The original input list has 4 values 2,2,3,3. After running the code below, i get 2,3,3 in the result. As per my understanding the loop would run for 4 times but after the second 2, the loop count is getting down to 3.. is that what is causing issue. Can someone help me understand what is going on.

 list = [2,2,3,3]
 duplicate = 0
 for numbers in list:
     if duplicate == numbers:
         list.remove(numbers)
     else:
         duplicate = numbers
 print(f'List after duplicates removed {list}')

Result I am expecting is 2,3 Logic is giving 2,3,3

score 0 · Answer 1 · answered Sep 17 '22 at 05:46

If you don't care about the order of values, pass the list to set:

>>> lst = [2,2,3,3]
>>> set(lst)
{2, 3}

And if you care about the order of the values in the list, you can start with empty list and keep appending the values to the list if its not there:

res = []
for i in lst:
    if i in res: continue
    res.append(i)
    
res
[2, 3]

Claudio · Answer 2 · 2022-09-17T16:37:23.287

You run as many, many others into the same trap of deleting items from a list while iterating over it. This have side-effects which are hard to understand if you are new to Python.

What is going on in your for numbers in list: loop after you delete an item? The loop continues as if were no deletion and doesn't deliver all elements of the list to the body of the loop. The not to loop body delivered duplicate elements are then seen in the result res.

By the way: you will run into trouble if you use builtin names for your variables. Give the variable list in your code another name, for example lst = = [2,2,3,3]

The right way of removing duplicates from a list in Python versions < 3.7 will be turning the list to a set and the set back to a list:

L =                        [1, 2, 3, 8, 5, 2, 4, 3, 2, 1, 9]
list(set(L)) =             [1, 2, 3, 4, 5, 8, 9]

Since Python 3.7+ the dictionary preserves the insertion order, so the best way to eliminate duplicates from a list would be:

`L_unique = list(dict.fromkeys(L))`

And *if you need to preserve the order of elements in the list with Python versions <3.7 use unique_everseen(L):

from itertools      import filterfalse
from more_itertools import unique_everseen
list(unique_everseen(L)) = [1, 2, 3, 8, 5, 4, 9]

If you don't like to install more_itertools or want a pure Python solution without importing anything, below ready to use code of unique_everseen() function:

def unique_everseen(iterable, key=None):
    "List unique elements, preserving order. Remember all elements ever seen."
    # unique_everseen('AAAABBBCCDAABBB') --> A B C D
    # unique_everseen('ABBCcAD', str.lower) --> A B C D
    def filterfalse(predicate, iterable):
        # filterfalse(lambda x: x%2, range(10)) --> 0 2 4 6 8
        if predicate is None:
            predicate = bool
        for x in iterable:
            if not predicate(x):
                yield x
    seen = set()
    seen_add = seen.add
    if key is None:
        for element in filterfalse(seen.__contains__, iterable):
            seen_add(element)
            yield element
    else:
        for element in iterable:
            k = key(element)
            if k not in seen:
                seen_add(k)
                yield element

And here the code used to print the lines used above in the text of the answer:

L = [1,2,3,8,5,2,4,3,2,1,9]
print(f'  {L =                        }')
print(f'  {list(set(L)) =             }')
from itertools      import filterfalse
from more_itertools import unique_everseen
print(f'  {list(unique_everseen(L)) = }')

Duplicate removal from a list in python

2 Answers2

L_unique = list(dict.fromkeys(L))

`L_unique = list(dict.fromkeys(L))`