78

How would I use python to check a list and delete all duplicates? I don't want to have to specify what the duplicate item is - I want the code to figure out if there are any and remove them if so, keeping only one instance of each. It also must work if there are multiple duplicates in a list.

For example, in my code below, the list lseparatedOrbList has 12 items - one is repeated six times, one is repeated five times, and there is only one instance of one. I want it to change the list so there are only three items - one of each, and in the same order they appeared before. I tried this:

for i in lseparatedOrbList:
   for j in lseparatedOrblist:
        if lseparatedOrbList[i] == lseparatedOrbList[j]:
            lseparatedOrbList.remove(lseparatedOrbList[j])

But I get the error:

Traceback (most recent call last):
  File "qchemOutputSearch.py", line 123, in <module>
    for j in lseparatedOrblist:
NameError: name 'lseparatedOrblist' is not defined

I'm guessing because it's because I'm trying to loop through lseparatedOrbList while I loop through it, but I can't think of another way to do it.

codeforester
  • 39,467
  • 16
  • 112
  • 140
laplacian
  • 821
  • 1
  • 6
  • 4

11 Answers11

141

Use set():

woduplicates = set(lseparatedOrblist)

Returns a set without duplicates. If you, for some reason, need a list back:

woduplicates = list(set(lseperatedOrblist))

This will, however, have a different order than your original list.

Elliot Cameron
  • 5,235
  • 2
  • 27
  • 34
Jacob
  • 41,721
  • 6
  • 79
  • 81
103

Just make a new list to populate, if the item for your list is not yet in the new list input it, else just move on to the next item in your original list.

for i in mylist:
  if i not in newlist:
    newlist.append(i)
gbeaven
  • 1,522
  • 2
  • 20
  • 40
Jonathon Vandezande
  • 2,446
  • 3
  • 22
  • 31
  • Works great and maintains the order, thanks! – laplacian Jul 20 '11 at 16:19
  • 15
    Good, I guess I haven't forgotten all my python, its only been two years. Just as a word of warning, I am pretty sure this is an O(n^2) operation so you might not want to use it on large lists (eg. 10,000 items). If you need it for big lists, I would create a hash table to check against (O(1), yielding an overall O(n) implementation), instead of checking against the list, but if you are dealing with large lists, I probably wouldn't want to use python then either. – Jonathon Vandezande Jul 20 '11 at 16:31
  • Yeah, the list shouldn't really be more than like fifteen so it's fine. – laplacian Jul 20 '11 at 16:48
  • 4
    The correct way to do this is to use a set(), see cilaris's answer below. – Wes Mason Sep 29 '11 at 14:24
  • 4
    What do you mean this is not correct way? This does the job that was asked for, without any overhead of creating a set. – Jonathon Vandezande Dec 07 '11 at 00:49
  • 6
    creating a set messes up the order – alvas Mar 03 '13 at 12:18
  • 7
    This maintains order and also works with non-hashable list items which is a plus. – Slater Victoroff Sep 12 '13 at 18:26
  • 1
    Also, a set will not work if the contents of the list are not hashable – derchambers Mar 14 '18 at 22:12
  • 1
    For preventing O(n^2), you'd want to use `set()` for the lookup. This would mean that you have 2 copies of data, but you'd be O(n) then. – InsanelyADHD Mar 14 '19 at 22:21
41

This should be faster and will preserve the original order:

seen = {}
new_list = [seen.setdefault(x, x) for x in my_list if x not in seen]

If you don't care about order, you can just:

new_list = list(set(my_list))
Paolo Moretti
  • 54,162
  • 23
  • 101
  • 92
32

You can do this like that:

x = list(set(x))

Example: if you do something like that:

x = [1,2,3,4,5,6,7,8,9,10,2,1,6,31,20]
x = list(set(x))
x

you will see the following result:

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 31]

There is only one thing you should think of: the resulting list will not be ordered as the original one (will lose the order in the process).

Tadeck
  • 132,510
  • 28
  • 152
  • 198
16

The modern way to do it that maintains the order is:

>>> from collections import OrderedDict
>>> list(OrderedDict.fromkeys(lseparatedOrbList))

as discussed by Raymond Hettinger in this answer. In python 3.5 and above this is also the fastest way - see the linked answer for details. However the keys must be hashable (as is the case in your list I think)


As of python 3.7 ordered dicts are a language feature so the above call becomes

>>> list(dict.fromkeys(lseparatedOrbList))

Performance:

"""Dedup list."""
import sys
import timeit

repeat = 3
numbers = 1000

setup = """"""
def timer(statement, msg='', _setup=None):
    print(msg, min(
        timeit.Timer(statement, setup=_setup or setup).repeat(
            repeat, numbers)))

print(sys.version)
s = """import random; n=%d; li = [random.randint(0, 100) for _ in range(n)]"""
for siz, m in ((150, "\nFew duplicates"), (15000, "\nMany duplicates")):
    print(m)
    setup = s % siz
    timer('s = set(); [i for i in li if i not in s if not s.add(i)]', "s.add(i):")
    timer('list(dict.fromkeys(li))', "dict:")
    timer('list(set(li))', 'Not order preserving: list(set(li)):')

gives:

3.7.6 (tags/v3.7.6:43364a7ae0, Dec 19 2019, 00:42:30) [MSC v.1916 64 bit (AMD64)]

Few duplicates
s.add(i): 0.008242200000040611
dict: 0.0037373999998635554
Not order preserving: list(set(li)): 0.0029409000001123786

Many duplicates
s.add(i): 0.2839437000000089
dict: 0.21970469999996567
Not order preserving: list(set(li)): 0.102068700000018

So dict seems consistently faster although approaching list comprehension with set.add for many duplicates - not sure if further varying the numbers would give different results. list(set) is of course faster but does not preserve original list order, a requirement here

Mr_and_Mrs_D
  • 32,208
  • 39
  • 178
  • 361
  • FWIW on the system I have access to, this takes 43µs on an input list of 50 random integers while `s = set(); [i for i in input if i not in s if not s.add(i)]` takes 7 and `list(set(input))` takes 1.5. – Masklinn Aug 05 '19 at 13:22
  • @Masklinn I added some timings – Mr_and_Mrs_D May 12 '21 at 10:19
8

This should do it for you:

new_list = list(set(old_list))

set will automatically remove duplicates. list will cast it back to a list.

Manny D
  • 20,310
  • 2
  • 29
  • 31
8

No, it's simply a typo, the "list" at the end must be capitalized. You can nest loops over the same variable just fine (although there's rarely a good reason to).

However, there are other problems with the code. For starters, you're iterating through lists, so i and j will be items not indices. Furthermore, you can't change a collection while iterating over it (well, you "can" in that it runs, but madness lies that way - for instance, you'll propably skip over items). And then there's the complexity problem, your code is O(n^2). Either convert the list into a set and back into a list (simple, but shuffles the remaining list items) or do something like this:

seen = set()
new_x = []
for x in xs:
    if x in seen:
        continue
    seen.add(x)
    new_xs.append(x)

Both solutions require the items to be hashable. If that's not possible, you'll probably have to stick with your current approach sans the mentioned problems.

  • I just upvoted your answer, but spotted you are suggesting list comprehension. That list comprehension will not work, as it will basically rewrite the `xs` list into `ys` if you use it like that: `ys = [x for x in xs if x not in ys]`. It is because the `ys` accessed within comprehension is the `ys` before assignement. – Tadeck Jul 20 '11 at 18:18
  • @Tadeck: Damn, you're right. Good catch. –  Jul 21 '11 at 15:39
5

It's because you are missing a capital letter, actually.

Purposely dedented:

for i in lseparatedOrbList:   # capital 'L'
for j in lseparatedOrblist:   # lowercase 'l'

Though the more efficient way to do it would be to insert the contents into a set.

If maintaining the list order matters (ie, it must be "stable"), check out the answers on this question

Community
  • 1
  • 1
Daniel DiPaolo
  • 55,313
  • 14
  • 116
  • 115
3

for unhashable lists. It is faster as it does not iterate about already checked entries.

def purge_dublicates(X):
    unique_X = []
    for i, row in enumerate(X):
        if row not in X[i + 1:]:
            unique_X.append(row)
    return unique_X
Davoud Taghawi-Nejad
  • 16,142
  • 12
  • 62
  • 82
-3

There is a faster way to fix this:

list = [1, 1.0, 1.41, 1.73, 2, 2, 2.0, 2.24, 3, 3, 4, 4, 4, 5, 6, 6, 8, 8, 9, 10]
list2=[]

for value in list:
    try:
        list2.index(value)
    except:
        list2.append(value)
list.clear()
for value in list2:
    list.append(value)
list2.clear()
print(list)
print(list2)
aurel
  • 1
  • 2
-3

In this way one can delete a particular item which is present multiple times in a list : Try deleting all 5

list1=[1,2,3,4,5,6,5,3,5,7,11,5,9,8,121,98,67,34,5,21]
print list1
n=input("item to be deleted : " )
for i in list1:
    if n in list1:
        list1.remove(n)
print list1
ajeet214
  • 1
  • 1