How to remove all duplicate items from a list

Question

How would I use python to check a list and delete all duplicates? I don't want to have to specify what the duplicate item is - I want the code to figure out if there are any and remove them if so, keeping only one instance of each. It also must work if there are multiple duplicates in a list.

For example, in my code below, the list lseparatedOrbList has 12 items - one is repeated six times, one is repeated five times, and there is only one instance of one. I want it to change the list so there are only three items - one of each, and in the same order they appeared before. I tried this:

for i in lseparatedOrbList:
   for j in lseparatedOrblist:
        if lseparatedOrbList[i] == lseparatedOrbList[j]:
            lseparatedOrbList.remove(lseparatedOrbList[j])

But I get the error:

Traceback (most recent call last):
  File "qchemOutputSearch.py", line 123, in <module>
    for j in lseparatedOrblist:
NameError: name 'lseparatedOrblist' is not defined

I'm guessing because it's because I'm trying to loop through lseparatedOrbList while I loop through it, but I can't think of another way to do it.

A common question: http://stackoverflow.com/search?q=python+duplicates+list. — S.Lott, Jul 20 '11 at 19:56
Your error is due to a simple typo: You didn't capitalize the second "l" in lseparatedOrbList in your second for loop. — ApproachingDarknessFish, Jan 18 '17 at 18:32

score 141 · Answer 1 · edited Apr 25 '19 at 05:26

141

Use set():

woduplicates = set(lseparatedOrblist)

Returns a set without duplicates. If you, for some reason, need a list back:

woduplicates = list(set(lseperatedOrblist))

This will, however, have a different order than your original list.

edited Apr 25 '19 at 05:26

Elliot Cameron

5,235
2
27
34

answered Jul 20 '11 at 16:05

Jacob

41,721
6
79
81

20

It is worth noting that this will fail if you have either lists or sets in your list. – Slater Victoroff Sep 12 '13 at 18:25
43

You should also explicitly note that this doesn't preserve element order. – Michał Górny Nov 21 '16 at 14:22
2

This wont work on every cases, wont work on a list of dict for instance – Aurele Collinet Jul 21 '20 at 12:25

score 103 · Accepted Answer · edited Mar 23 '21 at 17:37

103

Just make a new list to populate, if the item for your list is not yet in the new list input it, else just move on to the next item in your original list.

for i in mylist:
  if i not in newlist:
    newlist.append(i)

edited Mar 23 '21 at 17:37

gbeaven

1,522
2
20
40

answered Jul 20 '11 at 16:06

Jonathon Vandezande

2,446
3
22
31

Works great and maintains the order, thanks! – laplacian Jul 20 '11 at 16:19
15

Good, I guess I haven't forgotten all my python, its only been two years. Just as a word of warning, I am pretty sure this is an O(n^2) operation so you might not want to use it on large lists (eg. 10,000 items). If you need it for big lists, I would create a hash table to check against (O(1), yielding an overall O(n) implementation), instead of checking against the list, but if you are dealing with large lists, I probably wouldn't want to use python then either. – Jonathon Vandezande Jul 20 '11 at 16:31
Yeah, the list shouldn't really be more than like fifteen so it's fine. – laplacian Jul 20 '11 at 16:48
4

The correct way to do this is to use a set(), see cilaris's answer below. – Wes Mason Sep 29 '11 at 14:24
4

What do you mean this is not correct way? This does the job that was asked for, without any overhead of creating a set. – Jonathon Vandezande Dec 07 '11 at 00:49
6

creating a set messes up the order – alvas Mar 03 '13 at 12:18
7

This maintains order and also works with non-hashable list items which is a plus. – Slater Victoroff Sep 12 '13 at 18:26
1

Also, a set will not work if the contents of the list are not hashable – derchambers Mar 14 '18 at 22:12
1

For preventing O(n^2), you'd want to use `set()` for the lookup. This would mean that you have 2 copies of data, but you'd be O(n) then. – InsanelyADHD Mar 14 '19 at 22:21

Paolo Moretti · Answer 3 · 2013-10-10T10:33:37.313

41

This should be faster and will preserve the original order:

seen = {}
new_list = [seen.setdefault(x, x) for x in my_list if x not in seen]

If you don't care about order, you can just:

new_list = list(set(my_list))

edited Oct 10 '13 at 10:33

answered Jul 20 '11 at 16:38

Paolo Moretti

54,162
23
101
92

3

This is the best answer because it considers both order-preserving and not. – Elliot Cameron Apr 25 '19 at 05:27

score 32 · Answer 4 · answered Jul 20 '11 at 16:07

32

You can do this like that:

x = list(set(x))

Example: if you do something like that:

x = [1,2,3,4,5,6,7,8,9,10,2,1,6,31,20]
x = list(set(x))
x

you will see the following result:

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 31]

There is only one thing you should think of: the resulting list will not be ordered as the original one (will lose the order in the process).

answered Jul 20 '11 at 16:07

Tadeck

132,510
28
152
198

1

Smart and Pythonic :) – mushfiq Feb 22 '13 at 14:38
4

Doesn't work if the original list `x` contains lists – Calimo Dec 07 '16 at 09:54

Mr_and_Mrs_D · Answer 5 · 2021-05-12T10:42:45.087

The modern way to do it that maintains the order is:

>>> from collections import OrderedDict
>>> list(OrderedDict.fromkeys(lseparatedOrbList))

as discussed by Raymond Hettinger in this answer. In python 3.5 and above this is also the fastest way - see the linked answer for details. However the keys must be hashable (as is the case in your list I think)

As of python 3.7 ordered dicts are a language feature so the above call becomes

>>> list(dict.fromkeys(lseparatedOrbList))

Performance:

"""Dedup list."""
import sys
import timeit

repeat = 3
numbers = 1000

setup = """"""
def timer(statement, msg='', _setup=None):
    print(msg, min(
        timeit.Timer(statement, setup=_setup or setup).repeat(
            repeat, numbers)))

print(sys.version)
s = """import random; n=%d; li = [random.randint(0, 100) for _ in range(n)]"""
for siz, m in ((150, "\nFew duplicates"), (15000, "\nMany duplicates")):
    print(m)
    setup = s % siz
    timer('s = set(); [i for i in li if i not in s if not s.add(i)]', "s.add(i):")
    timer('list(dict.fromkeys(li))', "dict:")
    timer('list(set(li))', 'Not order preserving: list(set(li)):')

gives:

3.7.6 (tags/v3.7.6:43364a7ae0, Dec 19 2019, 00:42:30) [MSC v.1916 64 bit (AMD64)]

Few duplicates
s.add(i): 0.008242200000040611
dict: 0.0037373999998635554
Not order preserving: list(set(li)): 0.0029409000001123786

Many duplicates
s.add(i): 0.2839437000000089
dict: 0.21970469999996567
Not order preserving: list(set(li)): 0.102068700000018

So dict seems consistently faster although approaching list comprehension with set.add for many duplicates - not sure if further varying the numbers would give different results. list(set) is of course faster but does not preserve original list order, a requirement here

FWIW on the system I have access to, this takes 43µs on an input list of 50 random integers while `s = set(); [i for i in input if i not in s if not s.add(i)]` takes 7 and `list(set(input))` takes 1.5. — Masklinn, Aug 05 '19 at 13:22

score 8 · Answer 6 · answered Jul 20 '11 at 16:06

8

This should do it for you:

new_list = list(set(old_list))

set will automatically remove duplicates. list will cast it back to a list.

answered Jul 20 '11 at 16:06

Manny D

20,310
2
29
31

7

Doesn't work if the original list `x` contains lists – Calimo Dec 07 '16 at 09:55

score 8 · Answer 7 · 2011-07-21T15:39:53.350

No, it's simply a typo, the "list" at the end must be capitalized. You can nest loops over the same variable just fine (although there's rarely a good reason to).

However, there are other problems with the code. For starters, you're iterating through lists, so i and j will be items not indices. Furthermore, you can't change a collection while iterating over it (well, you "can" in that it runs, but madness lies that way - for instance, you'll propably skip over items). And then there's the complexity problem, your code is O(n^2). Either convert the list into a set and back into a list (simple, but shuffles the remaining list items) or do something like this:

seen = set()
new_x = []
for x in xs:
    if x in seen:
        continue
    seen.add(x)
    new_xs.append(x)

Both solutions require the items to be hashable. If that's not possible, you'll probably have to stick with your current approach sans the mentioned problems.

I just upvoted your answer, but spotted you are suggesting list comprehension. That list comprehension will not work, as it will basically rewrite the `xs` list into `ys` if you use it like that: `ys = [x for x in xs if x not in ys]`. It is because the `ys` accessed within comprehension is the `ys` before assignement. — Tadeck, Jul 20 '11 at 18:18

score 5 · Answer 8 · edited May 23 '17 at 12:02

5

It's because you are missing a capital letter, actually.

Purposely dedented:

for i in lseparatedOrbList:   # capital 'L'
for j in lseparatedOrblist:   # lowercase 'l'

Though the more efficient way to do it would be to insert the contents into a set.

If maintaining the list order matters (ie, it must be "stable"), check out the answers on this question

edited May 23 '17 at 12:02

Community

1
1

answered Jul 20 '11 at 16:05

Daniel DiPaolo

55,313
14
116
115

score 3 · Answer 9 · answered May 21 '14 at 15:41

3

for unhashable lists. It is faster as it does not iterate about already checked entries.

def purge_dublicates(X):
    unique_X = []
    for i, row in enumerate(X):
        if row not in X[i + 1:]:
            unique_X.append(row)
    return unique_X

answered May 21 '14 at 15:41

Davoud Taghawi-Nejad

16,142
12
62
82

aurel · Answer 10 · 2017-07-06T19:27:54.457

-3

There is a faster way to fix this:

list = [1, 1.0, 1.41, 1.73, 2, 2, 2.0, 2.24, 3, 3, 4, 4, 4, 5, 6, 6, 8, 8, 9, 10]
list2=[]

for value in list:
    try:
        list2.index(value)
    except:
        list2.append(value)
list.clear()
for value in list2:
    list.append(value)
list2.clear()
print(list)
print(list2)

edited Jul 06 '17 at 19:27

answered Jul 06 '17 at 19:11

aurel

1
2

score -3 · Answer 11 · answered Sep 14 '17 at 06:48

In this way one can delete a particular item which is present multiple times in a list : Try deleting all 5

list1=[1,2,3,4,5,6,5,3,5,7,11,5,9,8,121,98,67,34,5,21]
print list1
n=input("item to be deleted : " )
for i in list1:
    if n in list1:
        list1.remove(n)
print list1

How to remove all duplicate items from a list

11 Answers11

Linked

Related