0

Say given a list s = [2,2,2,3,3,3,4,4,4]

I saw the following code being used to obtain unique values from s:

unique_s = sorted(unique(s))

where unique is defined as:

def unique(seq): 
    # not order preserving 
    set = {}
    map(set.__setitem__, seq, []) 
    return set.keys()

I'm just curious to know if there is any difference between this and just doing list(set(s))? Both results in a mutable object with the same values.

I'm guessing this code is faster since it's only looping once rather than twice in the case of type conversion?

taskinoor
  • 45,586
  • 12
  • 116
  • 142
Mike
  • 683
  • 10
  • 22
  • use code button for post code – nkint Jan 27 '12 at 15:05
  • 3
    I believe the code above was (is) used with (to support) python version prior to 2.3 when set() didn't exist yet. – mouad Jan 27 '12 at 15:08
  • Not a python guy, so not posting this as an answer, but it looks like they do the same thing, other than the fact that doing list(set(s)) is probably not guaranteed to preserve the order, whereas when you call unique_s=sorted(unique(s)) it seems to preserve order – NominSim Jan 27 '12 at 15:09
  • Well, there's an obvious difference in that `list(set(s))` is giving you a sorted result just by chance :P – Ricardo Cárdenes Jan 27 '12 at 15:10
  • I meant the difference between the unique() function and list(set(s)). I know list() is not order preserving. – Mike Jan 27 '12 at 15:12
  • 2
    Even without set this could be much simpler: `def unique(seq): return dict.fromkeys(seq).keys()` – yak Jan 27 '12 at 15:17
  • related: [In Python, what is the fastest algorithm for removing duplicates from a list so that all elements are unique *while preserving order*?](http://stackoverflow.com/questions/89178/in-python-what-is-the-fastest-algorithm-for-removing-duplicates-from-a-list-so) – jfs Jan 27 '12 at 15:22

2 Answers2

3

You should use the code you describe:

list(set(s))

This works on all Pythons from 2.4 (I think) to 3.3, is concise, and uses built-ins in an easy to understand way.

The function unique appears to be designed to work if set is not a built-in, which is true for Python 2.3. Python 2.3 is fairly ancient (2003). The unique function is also broken for the Python 3.x series, since dict.keys returns an iterator for Python 3.x.

Dietrich Epp
  • 205,541
  • 37
  • 345
  • 415
1

For a sorted sequence you could use itertools unique_justseen() recipe to get unique values while preserving order:

from itertools import groupby
from operator import itemgetter

print map(itemgetter(0), groupby([2,2,2,3,3,3,4,4,4]))
# -> [2, 3, 4]

To remove duplicate items from a sorted sequence inplace (to leave only unique values):

def del_dups(sorted_seq):
    prev = object()
    pos = 0
    for item in sorted_seq:
        if item != prev:
            prev = item
            sorted_seq[pos] = item
            pos += 1
    del sorted_seq[pos:]

L = [2,2,2,3,3,3,4,4,4]
del_dups(L)
print L # -> [2, 3, 4]
jfs
  • 399,953
  • 195
  • 994
  • 1,670