Remove duplicates in a list while keeping its order (Python)

Question

This is actually an extension of this question. The answers of that question did not keep the "order" of the list after removing duplicates. How to remove these duplicates in a list (python)

biglist = 

[ 

    {'title':'U2 Band','link':'u2.com'}, 
    {'title':'Live Concert by U2','link':'u2.com'},
    {'title':'ABC Station','link':'abc.com'}

]

In this case, the 2nd element should be removed because a previous "u2.com" element already exists. However, the order should be kept.

score 39 · Answer 1 · answered Aug 29 '14 at 00:59

39

use set(), then re-sort using the index of the original list.

>>> mylist = ['c','a','a','b','a','b','c']
>>> sorted(set(mylist), key=lambda x: mylist.index(x))
['c', 'a', 'b']

answered Aug 29 '14 at 00:59

egafni

1,982
1
16
11

This is fantastic! Exactly what I was looking for. Would you mind explaining how it works? (with the use of lambda etc) Thanks – Oct 01 '15 at 15:44
2

Neat Python code. The downside is that it causes an extra sort, thus an unneeded O(n * log(n)) (where otherwise O(n) would be sufficient). – yprez Nov 01 '15 at 12:48
3

Worse, it calls `.index` for each distinct element, which does a linear search, so it's an unneeded O(n^2). – kaya3 Apr 23 '21 at 21:42

score 26 · Accepted Answer · answered Oct 11 '09 at 01:11

26

My answer to your other question, which you completely ignored!, shows you're wrong in claiming that

The answers of that question did not keep the "order"

my answer did keep order, and it clearly said it did. Here it is again, with added emphasis to see if you can just keep ignoring it...:

Probably the fastest approach, for a really big list, if you want to preserve the exact order of the items that remain, is the following...:

biglist = [ 
    {'title':'U2 Band','link':'u2.com'}, 
    {'title':'ABC Station','link':'abc.com'}, 
    {'title':'Live Concert by U2','link':'u2.com'} 
]

known_links = set()
newlist = []

for d in biglist:
  link = d['link']
  if link in known_links: continue
  newlist.append(d)
  known_links.add(link)

biglist[:] = newlist

answered Oct 11 '09 at 01:11

Alex Martelli

854,459
170
1,222
1,395

3

Hey Alex, out of curiosity why do you put the [:] on the left hand side of the assignment? I've usually seen it on the RHS. Is it just personal preference? Looking at it at first I wasn't even sure what it would do, haha. – xitrium Sep 08 '11 at 03:07
3

@xitrium Using `[:]` on the left replaced all the items in the list, instead of the list itself. It could have an effect e.g. if you do this inside a function with a list that is passed in: if you *change* the list it's changed outside the function, if you *replace* it then the outside list is unaffected). In this particular case, there are no observable effect that I can see. – Mark Oct 23 '15 at 15:00

Jochen Ritzel · Answer 3 · 2009-10-11T01:19:01.653

13

Generators are great.

def unique( seq ):
    seen = set()
    for item in seq:
        if item not in seen:
            seen.add( item )
            yield item

biglist[:] = unique( biglist )

edited Oct 11 '09 at 01:19

answered Oct 11 '09 at 01:11

Jochen Ritzel

104,512
31
200
194

1

This is what I needed for my problem. I would suggest making it more generic adding `key=lambda item: item` to the method signature. Then, use `key(item)` for the set. – Harvey Jan 20 '17 at 18:31

Tarnay Kálmán · Answer 4 · 2009-10-22T21:31:38.597

This page discusses different methods and their speeds: http://www.peterbe.com/plog/uniqifiers-benchmark

The recommended* method:

def f5(seq, idfun=None):  
    # order preserving 
    if idfun is None: 
        def idfun(x): return x 
    seen = {} 
    result = [] 
    for item in seq: 
        marker = idfun(item) 
        # in old Python versions: 
        # if seen.has_key(marker) 
        # but in new ones: 
        if marker in seen: continue 
        seen[marker] = 1 
        result.append(item) 
    return result

f5(biglist,lambda x: x['link'])

*by that page

score 3 · Answer 5 · answered Mar 19 '14 at 23:20

This is an elegant and compact way, with list comprehension (but not as efficient as with dictionary):

mylist = ['aaa','aba','aaa','aea','baa','aaa','aac','aaa',]

[ v for (i,v) in enumerate(mylist) if v not in mylist[0:i] ]

And in the context of the answer:

[ v for (i,v) in enumerate(biglist) if v['link'] not in map(lambda d: d['link'], biglist[0:i]) ]

Peter · Answer 6 · 2009-10-11T01:08:24.450

1

dups = {}
newlist = []
for x in biglist:
    if x['link'] not in dups:
      newlist.append(x)
      dups[x['link']] = None

print newlist

produces

[{'link': 'u2.com', 'title': 'U2 Band'}, {'link': 'abc.com', 'title': 'ABC Station'}]

Note that here I used a dictionary. This makes the test not in dups much more efficient than using a list.

edited Oct 11 '09 at 01:08

answered Oct 11 '09 at 00:59

Peter

127,331
53
180
211

1

You're wrong about checking in a dict being faster than in a set (lists are a completely different matter). – Alex Martelli Oct 11 '09 at 01:07
ok, fixed, thanks. I guess set is probably implemented with a hash. – Peter Oct 11 '09 at 01:08

score 1 · Answer 7 · edited Sep 09 '19 at 08:35

1

Try this :

list = ['aaa','aba','aaa','aea','baa','aaa','aac','aaa',]
uniq = []
for i in list:
               if i not in uniq:
                   uniq.append(i)

print list
print uniq

output will be :

['aaa', 'aba', 'aaa', 'aea', 'baa', 'aaa', 'aac', 'aaa']
['aaa', 'aba', 'aea', 'baa', 'aac']

edited Sep 09 '19 at 08:35

Arkistarvh Kltzuonstev

6,824
7
26
56

answered Mar 05 '12 at 18:50

falco

11
1

score 0 · Answer 8 · answered Oct 11 '09 at 00:55

A super easy way to do this is:

def uniq(a):
    if len(a) == 0:
        return []
    else:
        return [a[0]] + uniq([x for x in a if x != a[0]])

This is not the most efficient way, because:

it searches through the whole list for every element in the list, so it's O(n^2)
it's recursive so uses a stack depth equal to the length of the list

However, for simple uses (no more than a few hundred items, not performance critical) it is sufficient.

Can anyone come up with a way that is scalable? – TIMEX Oct 11 '09 at 00:59 — TIMEX, Oct 11 '09 at 00:59

score 0 · Answer 9 · answered Oct 11 '09 at 00:59

0

I think using a set should be pretty efficent.

seen_links = set()
for index in len(biglist):
    link = biglist[index]['link']
    if link in seen_links:
        del(biglist[index])
    seen_links.add(link)

I think this should come in at O(nlog(n))

answered Oct 11 '09 at 00:59

ABentSpoon

5,007
1
26
23

2

in fact it is O(n^2) because del on a list is O(n) – Xavier Combelle Apr 01 '11 at 14:33

Remove duplicates in a list while keeping its order (Python)

9 Answers9

Linked