Remove duplicate numbers from a list

Question

I was attempting to remove all duplicated numbers in a list.

I was trying to understand what is wrong with my code.

numbers = [1, 1, 1, 1, 6, 5, 5, 2, 3]
for x in numbers:
    if numbers.count(x) >= 2:
        numbers.remove(x)
print(numbers)

The result I got was:

[1, 1, 6, 5, 2, 3]

Homework?!?!? GRRRR... Just kidding. You should attempt this problem by maintaining a set of "seen" numbers. Add a number you have not seen to the set, and remove a number you already have, from the list. Note that removing elements from a list in-place is a bad idea, so create a new one. — cs95, Apr 17 '19 at 09:21
@cs95 Hmm I am aware of the alternative method of doing this but just want to better understand on how it works haha.... Maybe I need to change my perspective of looking at things since ive been doing science stuff all my life hahaha — Linsanity, Apr 17 '19 at 09:24
Possible duplicate of: https://stackoverflow.com/questions/7961363/removing-duplicates-in-lists — Mahir Islam, Apr 17 '19 at 09:28
Possible duplicate of [Removing duplicates in lists](https://stackoverflow.com/questions/7961363/removing-duplicates-in-lists) — Mahir Islam, Apr 17 '19 at 09:28
Possible duplicate of [How do you remove duplicates from a list whilst preserving order?](https://stackoverflow.com/questions/480214/how-do-you-remove-duplicates-from-a-list-whilst-preserving-order) — Kamal Nayan, Apr 17 '19 at 09:32
Relevant sopython canonical question: https://sopython.com/canon/95/removing-items-from-a-list-while-iterating-over-the-list/ — TrebledJ, Apr 17 '19 at 09:38
I like how there aren't any inquisitive statements in the question, yet only a couple people *really* answered what is stated to be the concern: '*what is wrong with my code*'. — TrebledJ, Apr 17 '19 at 09:43

DirtyBit · Answer 1 · 2019-04-17T10:03:24.770

numbers = [1, 1, 1, 1, 6, 5, 5, 2, 3]

Using a shallow copy of the list:

for x in numbers[:]:
    if numbers.count(x) >= 2:
        numbers.remove(x)
print(numbers)                                 # [1, 6, 5, 2, 3]

Alternatives:

Preserving the order of the list:

Using dict.fromkeys()

print(list(dict.fromkeys(numbers).keys()))     # [1, 6, 5, 2, 3]

Using more_itertools.unique_everseen(iterable, key=None):

from  more_itertools import unique_everseen    
print(list(unique_everseen(numbers)))          # [1, 6, 5, 2, 3]

Using pandas.unique:

import pandas as pd
print(pd.unique(numbers).tolist())             # [1, 6, 5, 2, 3]

Using collections.OrderedDict([items]):

from collections import OrderedDict
print(list(OrderedDict.fromkeys(numbers)))   # [1, 6, 5, 2, 3]

Using itertools.groupby(iterable[, key]):

from itertools import groupby
print([k for k,_ in groupby(numbers)])       # [1, 6, 5, 2, 3]

Ignoring the order of the list:

Using numpy.unique:

import numpy as np
print(np.unique(numbers).tolist())            # [1, 2, 3, 5, 6]

Using set():

print(list(set(numbers)))                     # [1, 2, 3, 5, 6]

Using frozenset([iterable]):

print(list(frozenset(numbers)))               # [1, 2, 3, 5, 6]

score 2 · Answer 2 · edited Jun 20 '20 at 09:12

I guess the idea is to write code yourself without using library functions. Then I would still suggest to use additional set structure to store your previous items and go only once over your array:

numbers = [1, 1, 1, 1, 6, 5, 5, 2, 3]
unique = set()
for x in numbers:
    if x not in unique:
        unique.add(x)
numbers = list(unique)
print(numbers)

If you want to use your code then the problem is that you modify collection in for each loop, which is a big NO NO in most programming languages. Although Python allows you to do that, the problem and solution are already described in this answer: How to remove items from a list while iterating?:

Note: There is a subtlety when the sequence is being modified by the loop (this can only occur for mutable sequences, i.e. lists). An internal counter is used to keep track of which item is used next, and this is incremented on each iteration. When this counter has reached the length of the sequence the loop terminates. This means that if the suite deletes the current (or a previous) item from the sequence, the next item will be skipped (since it gets the index of the current item which has already been treated). Likewise, if the suite inserts an item in the sequence before the current item, the current item will be treated again the next time through the loop. This can lead to nasty bugs that can be avoided by making a temporary copy using a slice of the whole sequence, e.g.,
for x in a[:]:
   if x < 0: a.remove(x)

score 1 · Answer 3 · answered Apr 17 '19 at 09:24

1

Why don't you simply use a set:

numbers = [1, 1, 1, 1, 6, 5, 5, 2, 3]
numbers = list(set(numbers))
print(numbers)

answered Apr 17 '19 at 09:24

Kamal Nayan

1,890
21
34

1

That won't preserve the ordering though. This might be okay or not, OP does not specify ^^ – spectras Apr 17 '19 at 09:27

Chayemor · Answer 4 · 2019-04-17T09:38:58.197

Before anything, the first advice I can give is to never edit over an array that you are looping. All kinds of wacky stuff happens. Your code is fine (I recommend reading other answers though, there's an easier way to do this with a set, which pretty much handles the duplication thing for you).

Instead of removing number from the array you are looping, just clone the array you are looping in the actual for loop syntax with slicing.

numbers = [1, 1, 1, 1, 6, 5, 5, 2, 3]
for x in numbers[:]:
    if numbers.count(x) >= 2:
        numbers.remove(x)
    print(numbers)
print("Final")          
print(numbers)

The answer there is numbers[:], which gives back a clone of the array. Here's the print output:

[1, 1, 1, 6, 5, 5, 2, 3]
[1, 1, 6, 5, 5, 2, 3]
[1, 6, 5, 5, 2, 3]
[1, 6, 5, 5, 2, 3]
[1, 6, 5, 5, 2, 3]
[1, 6, 5, 2, 3]
[1, 6, 5, 2, 3]
[1, 6, 5, 2, 3]
[1, 6, 5, 2, 3]
Final
[1, 6, 5, 2, 3]

Leaving a placeholder here until I figure out how to explain why in your particular case it's not working, like the actual step by step reason.

Another way to solve this making use of the beautiful language that is Python, is through list comprehension and sets.

Why a set. Because the definition of this data structure is that the elements are unique, so even if you try to put in multiple elements that are the same, they won't appear as repeated in the set. Cool, right?

List comprehension is some syntax sugar for looping in one line, get used to it with Python, you'll either use it a lot, or see it a lot :)

So with list comprehension you will iterate an iterable and return that item. In the code below, x represents each number in numbers, x is returned to be part of the set. Because the set handles duplicates...voila, your code is done.

numbers = [1, 1, 1, 1, 6, 5, 5, 2, 3]
nubmers_a_set = {x for x in numbers }       
print(nubmers_a_set)

seralouk · Answer 5 · 2019-04-17T09:25:58.943

0

This seems like homework but here is a possible solution:

import numpy as np 

numbers = [1, 1, 1, 1, 6, 5, 5, 2, 3]
filtered = list(np.unique(numbers))

print(filtered)
#[1, 2, 3, 5, 6]

This solution does not preserve the ordering. If you need also the ordering use:

filtered_with_order = list(dict.fromkeys(numbers))

edited Apr 17 '19 at 09:25

answered Apr 17 '19 at 09:24

seralouk

30,938
9
118
133

Not a good solution to use numpy for smaller tasks. Also, see is beginner to python, so this solution won't help her – Kamal Nayan Apr 17 '19 at 09:25

score 0 · Answer 6 · answered Apr 17 '19 at 09:27

0

Why don't you use fromkeys?

numbers = [1, 1, 1, 1, 6, 5, 5, 2, 3]
numbers = list(dict.fromkeys(numbers))

Output: [1,6,5,2,3]

answered Apr 17 '19 at 09:27

Marty

146
1
13

Also: This keeps the original order – Marty Apr 17 '19 at 09:33

score 0 · Answer 7 · answered Apr 17 '19 at 09:28

The flow is as follows.

Now the list is [1, 1, 1, 1, 6, 5, 5, 2, 3] and Index is 0. The x is 1. The numbers.count(1) is 4 and thus the 1 at index 0 is removed.

Now the numbers list becomes [1, 1, 1, 6, 5, 5, 2, 3] but the Index will +1 and becomes 1. The x is 1. The numbers.count(1) is 3 and thus the 1 and index 1 is removed.

Now the numbers list becomes [1, 1, 6, 5, 5, 2, 3] but the Index will +1 and becomes 2. The x will be 6.

etc...

So that's why there are two 1's.

Please correct me if I am wrong. Thanks!

score 0 · Answer 8 · answered Apr 17 '19 at 09:35

A fancy method is to use collections.Counter:

>>> from collections import Counter
>>> numbers = [1, 1, 1, 1, 6, 5, 5, 2, 3]
>>> c = Counter(numbers)
>>> list(c.keys())
[1, 6, 5, 2, 3]

This method have a linear time complexity (O(n)) and uses a really performant library.

score 0 · Answer 9 · answered Apr 17 '19 at 09:38

You can try:

from  more_itertools import unique_everseen
items = [1, 1, 1, 1, 6, 5, 5, 2, 3]
list(unique_everseen(items))

or

from collections import OrderedDict
>>> items = [1, 1, 1, 1, 6, 5, 5, 2, 3]
>>> list(OrderedDict.fromkeys(items))
[1, 2, 0, 3]

more you can find here How do you remove duplicates from a list whilst preserving order?

Remove duplicate numbers from a list

9 Answers9