27

This is actually an extension of this question. The answers of that question did not keep the "order" of the list after removing duplicates. How to remove these duplicates in a list (python)

biglist = 

[ 

    {'title':'U2 Band','link':'u2.com'}, 
    {'title':'Live Concert by U2','link':'u2.com'},
    {'title':'ABC Station','link':'abc.com'}

]

In this case, the 2nd element should be removed because a previous "u2.com" element already exists. However, the order should be kept.

Community
  • 1
  • 1
TIMEX
  • 259,804
  • 351
  • 777
  • 1,080

9 Answers9

39

use set(), then re-sort using the index of the original list.

>>> mylist = ['c','a','a','b','a','b','c']
>>> sorted(set(mylist), key=lambda x: mylist.index(x))
['c', 'a', 'b']
egafni
  • 1,982
  • 1
  • 16
  • 11
  • This is fantastic! Exactly what I was looking for. Would you mind explaining how it works? (with the use of lambda etc) Thanks –  Oct 01 '15 at 15:44
  • 2
    Neat Python code. The downside is that it causes an extra sort, thus an unneeded O(n * log(n)) (where otherwise O(n) would be sufficient). – yprez Nov 01 '15 at 12:48
  • 3
    Worse, it calls `.index` for each distinct element, which does a linear search, so it's an unneeded O(n^2). – kaya3 Apr 23 '21 at 21:42
26

My answer to your other question, which you completely ignored!, shows you're wrong in claiming that

The answers of that question did not keep the "order"

  • my answer did keep order, and it clearly said it did. Here it is again, with added emphasis to see if you can just keep ignoring it...:

Probably the fastest approach, for a really big list, if you want to preserve the exact order of the items that remain, is the following...:

biglist = [ 
    {'title':'U2 Band','link':'u2.com'}, 
    {'title':'ABC Station','link':'abc.com'}, 
    {'title':'Live Concert by U2','link':'u2.com'} 
]

known_links = set()
newlist = []

for d in biglist:
  link = d['link']
  if link in known_links: continue
  newlist.append(d)
  known_links.add(link)

biglist[:] = newlist
Alex Martelli
  • 854,459
  • 170
  • 1,222
  • 1,395
  • 3
    Hey Alex, out of curiosity why do you put the [:] on the left hand side of the assignment? I've usually seen it on the RHS. Is it just personal preference? Looking at it at first I wasn't even sure what it would do, haha. – xitrium Sep 08 '11 at 03:07
  • 3
    @xitrium Using `[:]` on the left replaced all the items in the list, instead of the list itself. It could have an effect e.g. if you do this inside a function with a list that is passed in: if you *change* the list it's changed outside the function, if you *replace* it then the outside list is unaffected). In this particular case, there are no observable effect that I can see. – Mark Oct 23 '15 at 15:00
13

Generators are great.

def unique( seq ):
    seen = set()
    for item in seq:
        if item not in seen:
            seen.add( item )
            yield item

biglist[:] = unique( biglist )
Jochen Ritzel
  • 104,512
  • 31
  • 200
  • 194
  • 1
    This is what I needed for my problem. I would suggest making it more generic adding `key=lambda item: item` to the method signature. Then, use `key(item)` for the set. – Harvey Jan 20 '17 at 18:31
3

This page discusses different methods and their speeds: http://www.peterbe.com/plog/uniqifiers-benchmark

The recommended* method:

def f5(seq, idfun=None):  
    # order preserving 
    if idfun is None: 
        def idfun(x): return x 
    seen = {} 
    result = [] 
    for item in seq: 
        marker = idfun(item) 
        # in old Python versions: 
        # if seen.has_key(marker) 
        # but in new ones: 
        if marker in seen: continue 
        seen[marker] = 1 
        result.append(item) 
    return result

f5(biglist,lambda x: x['link'])

*by that page

Tarnay Kálmán
  • 6,907
  • 5
  • 46
  • 57
3

This is an elegant and compact way, with list comprehension (but not as efficient as with dictionary):

mylist = ['aaa','aba','aaa','aea','baa','aaa','aac','aaa',]

[ v for (i,v) in enumerate(mylist) if v not in mylist[0:i] ]

And in the context of the answer:

[ v for (i,v) in enumerate(biglist) if v['link'] not in map(lambda d: d['link'], biglist[0:i]) ]
rools
  • 1,539
  • 12
  • 21
1
dups = {}
newlist = []
for x in biglist:
    if x['link'] not in dups:
      newlist.append(x)
      dups[x['link']] = None

print newlist

produces

[{'link': 'u2.com', 'title': 'U2 Band'}, {'link': 'abc.com', 'title': 'ABC Station'}]

Note that here I used a dictionary. This makes the test not in dups much more efficient than using a list.

Peter
  • 127,331
  • 53
  • 180
  • 211
1

Try this :

list = ['aaa','aba','aaa','aea','baa','aaa','aac','aaa',]
uniq = []
for i in list:
               if i not in uniq:
                   uniq.append(i)

print list
print uniq

output will be :

['aaa', 'aba', 'aaa', 'aea', 'baa', 'aaa', 'aac', 'aaa']
['aaa', 'aba', 'aea', 'baa', 'aac']
Arkistarvh Kltzuonstev
  • 6,824
  • 7
  • 26
  • 56
falco
  • 11
  • 1
0

A super easy way to do this is:

def uniq(a):
    if len(a) == 0:
        return []
    else:
        return [a[0]] + uniq([x for x in a if x != a[0]])

This is not the most efficient way, because:

  • it searches through the whole list for every element in the list, so it's O(n^2)
  • it's recursive so uses a stack depth equal to the length of the list

However, for simple uses (no more than a few hundred items, not performance critical) it is sufficient.

Greg Hewgill
  • 951,095
  • 183
  • 1,149
  • 1,285
0

I think using a set should be pretty efficent.

seen_links = set()
for index in len(biglist):
    link = biglist[index]['link']
    if link in seen_links:
        del(biglist[index])
    seen_links.add(link)

I think this should come in at O(nlog(n))

ABentSpoon
  • 5,007
  • 1
  • 26
  • 23