6

In Python 2.2 (don't ask), what's the neatest way to sort a list and remove duplicates?

I can obviously write a function that would sort() then iterate, but am wondering if there's an idiomatic one-liner.

edit: The list is short, so efficiency is not a concern. Also, the elements are immutable.

NPE
  • 486,780
  • 108
  • 951
  • 1,012
  • 1
    I can't recall if the `set` module was in 2.2. If it was, `set(myList)` will remove all dupes. – g.d.d.c Oct 14 '11 at 16:58
  • @g.d.d.c: Already checked, `sets` is 2.3+. – NPE Oct 14 '11 at 16:58
  • Ahh. Any chance it'd be in `__future__` in that version? Are the values hashable? – g.d.d.c Oct 14 '11 at 17:02
  • @g.d.d.c Don't think it's in `__future__`. The values are strings, and there aren't many, so don't care about efficiency. – NPE Oct 14 '11 at 17:03
  • 3
    Future answerers may find the [2.2 documentation](http://docs.python.org/release/2.2/) useful, to check that the language features their solution depends on actually exist. – Kevin Oct 14 '11 at 17:04
  • 1
    Here's a recipe for sets for python 2.2 http://code.activestate.com/recipes/106469-yet-another-set-class-for-python/. – Steven Rumbalski Oct 14 '11 at 18:04

4 Answers4

7

For old python versions, and since you're using strings, there's no one-liner I can think of, but a pattern would probably be this, using dictionaries:

def sorted_uniq(your_list):
    table = {}
    for s in your_list:
        table[s] = None
    k = table.keys()
    k.sort()
    return k

Adapted from an ancient ActiveState code snippet thread that Alex Martelli himself wrote several comments on: http://code.activestate.com/recipes/52560/

A shorter way with list comprehensions:

def sort_uniq(alist):
   d = {}
   mod_list = [d.setdefault(i,i) for i in alist if i not in d]
   mod_list.sort()
   return mod_list

Aside from Steven's neat (yet slightly unattractive) one liner, I think this heads toward the fewest lines and most idiomatic way of doing it with Python 2.2:

Thanks to Steven Rumbalski in the comments, the 2nd version can be condensed further with python's zip function:

def sort_uniq(alist):
   mod_list = dict(zip(alist,alist)).keys()
   mod_list.sort()
   return mod_list

If list.sort() didn't operate by side effect, we'd have a one liner. ;)

wkl
  • 77,184
  • 16
  • 165
  • 176
  • This is what I was coming up with as well. I keep trying to think of a clever way to encapsulate it as a one-liner, but can't. – g.d.d.c Oct 14 '11 at 17:08
  • Hate to be a pain, but `sorted()` is Python 2.4+. Easy to replace with `sort()` though. – NPE Oct 14 '11 at 17:10
  • @aix - Yeah I was wondering if `sorted` was in 2.2, but the docs didn't say so - switched to calling `sort` on `table.keys()`. – wkl Oct 14 '11 at 17:12
  • 1
    On your second version you could get rid of referring to the intermediate dict and do `mod_list = dict(zip(alist, alist)).keys()` – Steven Rumbalski Oct 14 '11 at 17:52
  • @birryree: I ended up liking that enough that I added a modified version to my answer. – Steven Rumbalski Oct 14 '11 at 21:20
  • @birryree: Awesome, I think the third version is as close to an idiomatic one-liner as we're going to get. – NPE Oct 14 '11 at 23:09
  • Here's my best attempt at a one-liner. `x = x.sort() or [s for s, t in zip(x, x[1:] + [None]) if s != t]` where `x` is the original list. – Steven Rumbalski Oct 15 '11 at 19:06
  • A tiny bit shorter `x = x.sort() or [s for s, t in map(None, x, x[1:]) if s != t]` – Steven Rumbalski Oct 15 '11 at 19:24
5

Idiomatic and a one-liner? No.

Here's a non-idiomatic butt-ugly one-liner.

>>> x = [4, 3, 3, 2, 4, 1]
>>> [y for y in (locals().__setitem__('d',{}) or x.sort() or x) 
        if y not in d and (d.__setitem__(y, None) or True)]
[1, 2, 3, 4]

If a simple two-liner is acceptable:

x = [4, 3, 3, 2, 4, 1]
x = dict(map(None,x,[])).keys()
x.sort()

Or make two small helper functions (works for any sequence):

def unique(it):
    return dict(map(None,it,[])).keys()

def sorted(it):
    alist = [item for item in it]
    alist.sort()
    return alist

print sorted(unique([4, 3, 3, 2, 4, 1]))

gives

[1, 2, 3, 4]

And finally, a semipythonic one liner:

x = [4, 3, 3, 2, 4, 1]
x.sort() or [s for s, t in zip(x, x[1:] + [None]) if s != t]
Steven Rumbalski
  • 44,786
  • 9
  • 89
  • 119
  • I like the new functions, makes it easy to just drop code into newer versions of python. – wkl Oct 15 '11 at 03:06
2

For the record, Python 2.2 does have sets, but under the "sets" module, so this will get you a long way:

from sets import Set
myList = list(Set(myList))
# now we're duplicate-free, a standard sorting might be enough
myList.sort()
Giacomo Lacava
  • 1,784
  • 13
  • 25
  • +1 because this is close to the in-built capabilities of newer Python versions: `list(set(myList))` --and if for some strange reason you wanted to, you could make it identical with `from sets import Set as set` – MarkHu Oct 30 '14 at 02:30
0

Probably the best answer is to use a binary tree:

# Make yield work in Python 2.2
from __future__ import generators

class TreeNode(object):
    def __init__(self, value):
        self.left = None
        self.right = None
        self.value = value

    def add(self, value):
        if value == self.value:
            return
        if value < self.value:
            if self.left is None:
                self.left = TreeNode(value)
            else:
                self.left.add(value)
        else:
            if self.right is None:
                self.right = TreeNode(value)
            else:
                self.right.add(value)

    def __iter__(self):
        if self.left is not None:
            for value in self.left:
                yield value
        yield self.value
        if self.right is not None:
            for value in self.right:
                yield value

class DedupeSorter(object):
    def __init__(self):
        self.root = None

    def add(self, value):
        if self.root is None:
            self.root = TreeNode(value)
        else:
            self.root.add(value)

    def __iter__(self):
        if self.root is None:
            return []
        else:
            return self.root.__iter__()

def dedupe_and_sort(l):
    sorter = DedupeSorter()
    for value in l:
        sorter.add(value)
    return list(sorter)

Definitely not idiomatic, but should be fast. It basically creates a tree-based set and iterates over it. I don't have Python 2.2 so hopefully it works :p

Brendan Long
  • 53,280
  • 21
  • 146
  • 188