87

Consider a Python list my_list containing ['foo', 'foo', 'bar'].

What is the most Pythonic way to uniquify and sort a list ?
(think cat my_list | sort | uniq)

This is how I currently do it and while it works I'm sure there are better ways to do it.

my_list = []
...
my_list.append("foo")
my_list.append("foo")
my_list.append("bar")
...
my_list = set(my_list)
my_list = list(my_list)
my_list.sort()
oHo
  • 51,447
  • 27
  • 165
  • 200
knorv
  • 49,059
  • 74
  • 210
  • 294
  • Possible duplicate of [How to remove duplicates from Python list and keep order?](http://stackoverflow.com/questions/479897/how-to-remove-duplicates-from-python-list-and-keep-order) –  Dec 06 '15 at 13:34

5 Answers5

142
my_list = sorted(set(my_list))
Ignacio Vazquez-Abrams
  • 776,304
  • 153
  • 1,341
  • 1,358
  • 27
    Note that this only works for hashable types, so for example this won't work on lists. – taleinat May 28 '10 at 20:11
  • Its worth mentioning that this does everything in memory while `sort` (usually?) knows to persist to disc. If you're applying this to a large chunk of data it should fail on `MemoryError`. Good answer though :) – Reut Sharabani May 29 '17 at 08:16
  • @ReutSharabani: No, the different is that the `sort()` method operates in-place and so doesn't require additional allocation. – Ignacio Vazquez-Abrams May 29 '17 at 11:09
  • Not sure what you mean... Most if not all `sort`s will write to disc if needed. – Reut Sharabani May 29 '17 at 11:35
  • + Not python's, if thats the confusion. My point is the OPs code (cat my_list | sort | uniq) will run on files that fit in the HD while your code will have to fit in RAM. – Reut Sharabani May 29 '17 at 11:36
  • 2
    A sort followed by an in-place unique is a far more efficient operation than converting a list to a set, and then sorting that. Even using a min-heap would be preferable. – ioquatix Mar 28 '19 at 04:11
  • The top comment is wrong. This works perfectly fine on a list. – Anthony May 31 '20 at 14:35
  • @Anthony you misunderstood the top comment: this answer won't work on a sequence of lists (ie talking about the type of the sequence items, not the type of the sequence itself). – tzot Dec 05 '20 at 12:00
  • @tzot I understand the difference between sorting a list and sorting a list of lists. The top comment will confuse beginners and probably many non-beginners because the point is not clear. It's nonsense to try to sort a list of lists anyway, so the point being made is not clear in the first place. Even if that concept is to be explained, the way it is written is not clear and the point being offered is ambiguous especially to the uninitiated. – Anthony Dec 06 '20 at 17:24
  • Note that `sorted()` is important, and `list(set(my_list))` would not be enough, because [Python sets are unordered and nondeterministic](https://stackoverflow.com/questions/51927850/is-set-deterministic), so `python -c 'print(list(set(["a","b"])))'` may give a different result on each run. – nh2 Jan 02 '21 at 04:15
  • @Anthony *Your* comment is confusing. The top comment is correct: this will not work on non-hashable types. Sorting a list of lists was just an example. Perhaps that example could have been *better*, but the top comment is not *wrong*. – jamesdlin Nov 05 '22 at 02:08
20
# Python ≥ 2.4
# because of (generator expression) and itertools.groupby, sorted

import itertools

def sort_uniq(sequence):
    return (x[0] for x in itertools.groupby(sorted(sequence)))

Faster:

import itertools, operator
import sys

if sys.hexversion < 0x03000000:
    mapper= itertools.imap # 2.4 ≤ Python < 3
else:
    mapper= map # Python ≥ 3

def sort_uniq(sequence):
    return mapper(
        operator.itemgetter(0),
        itertools.groupby(sorted(sequence)))

Both versions return an generator, so you might want to supply the result to the list type:

sequence= list(sort_uniq(sequence))

Note that this will work with non-hashable items too:

>>> list(sort_uniq([[0],[1],[0]]))
[[0], [1]]
tzot
  • 92,761
  • 29
  • 141
  • 204
  • 1
    If you are using python3: Py3 map and in Py2 itertools.imap do exactly the same thing. ( In Py3 iter(map(...)) is redundant. ) – The Demz Oct 28 '13 at 10:59
  • This is much better than the accepted answer assuming you have large chunk of data. +1 – Reut Sharabani May 29 '17 at 08:14
  • @TheDemz the answer needed taking into account that Python 3 is much more commonplace now than then; thanks – tzot May 29 '17 at 21:09
  • Note that `x[0]` (or `operator.itemgetter(0)`) won't work if you're using a `key` argument to `groupby` to decide some alternate equality between elements for uniqueness purposes (roughly the equivalent of using `-f` or `-s` as arguments to `uniq`). In this case the key isn't the same as the input data elements. I think in this case something like `next(iter(x[1]))` would work to resolve to the first element of each "identical according to the key function" group instead. – Robie Basak Jun 12 '19 at 10:08
8

The straightforward solution is provided by Ignacio—sorted(set(foo)).

If you have unique data, there's a reasonable chance you don't just want to do sorted(set(...)) but rather to store a set all the time and occasionally pull out a sorted version of the values. (At that point, it starts sounding like the sort of thing people often use a database for, too.)

If you have a sorted list and you want to check membership on logarithmic and add an item in worst case linear time, you can use the bisect module.

If you want to keep this condition all the time and you want to simplify things or make some operations perform better, you might consider blist.sortedset.

Mike Graham
  • 73,987
  • 14
  • 101
  • 130
  • 1
    Consider [sortedcontainers](http://www.grantjenks.com/docs/sortedcontainers/) . [SortedSet](http://www.grantjenks.com/docs/sortedcontainers/sortedset.html) instead of blist. It's [faster](http://www.grantjenks.com/docs/sortedcontainers/performance.html) and pure-Python. – GrantJ Sep 18 '15 at 19:42
2

Others have mentioned sorted(set(my_list)), which works for hashable values such as strings, numbers and tuples, but not for unhashable types such as lists.

To get a sorted list of values of any sortable type, without duplicates:

from itertools import izip, islice
def unique_sorted(values):
    "Return a sorted list of the given values, without duplicates."
    values = sorted(values)
    if not values:
        return []
    consecutive_pairs = izip(values, islice(values, 1, len(values)))
    result = [a for (a, b) in consecutive_pairs if a != b]
    result.append(values[-1])
    return result

This can be further simplified using the "pairwise" or "unique_justseen" recipes from the itertools documentation.

taleinat
  • 8,441
  • 1
  • 30
  • 44
-4

Can't say it is clean way to do that, but just for fun:

my_list = [x for x in sorted(my_list) if not x in locals()["_[1]"]]
andreypopp
  • 6,887
  • 5
  • 26
  • 26