728

Python has an ordered dictionary. What about an ordered set?

Jean-François Corbett
  • 37,420
  • 30
  • 139
  • 188
Casebash
  • 114,675
  • 90
  • 247
  • 350

16 Answers16

360

The answer is no, but you can use collections.OrderedDict from the Python standard library with just keys (and values as None) for the same purpose.

Update: As of Python 3.7 (and CPython 3.6), standard dict is guaranteed to preserve order and is more performant than OrderedDict. (For backward compatibility and especially readability, however, you may wish to continue using OrderedDict.)

Here's an example of how to use dict as an ordered set to filter out duplicate items while preserving order, thereby emulating an ordered set. Use the dict class method fromkeys() to create a dict, then simply ask for the keys() back.

>>> keywords = ['foo', 'bar', 'bar', 'foo', 'baz', 'foo']

>>> list(dict.fromkeys(keywords))
['foo', 'bar', 'baz']
Asclepius
  • 57,944
  • 17
  • 167
  • 143
jrc
  • 20,354
  • 10
  • 69
  • 64
  • 7
    Maybe worth mentioning that this also works (faster) with vanilla `dict.fromkeys()`. But in that case, key order is only preserved in CPython 3.6+ implementations, so `OrderedDict` is a more portable solution when order matters. – jez Dec 21 '18 at 21:29
  • 5
    @AnwarHossain `keys = (1,2,3,1,2,1)` `list(OrderedDict.fromkeys(keys).keys())` -> `[1, 2, 3]`, python-3.7. It works. – raratiru Apr 09 '19 at 16:33
  • 6
    Can we infer that Set in Python 3.7+ preserve order too ? – Pierre Carbonnelle Sep 07 '19 at 06:14
  • 2
    This answers the actual question instead of jumping right into a work-around. – shrewmouse Sep 11 '19 at 14:16
  • 43
    @user474491 Unlike `dict`, `set` in Python 3.7+ unfortunately does not preserve order. – c z Jan 24 '20 at 15:02
  • 2
    `dict` is *not* guaranteed to preserve order. From the link, "The order-preserving aspect of this new implementation is considered an implementation detail and should not be relied upon." – David Ehrmann Sep 02 '20 at 00:26
  • 9
    @DavidEhrmann Keep reading a little further on the same link: "Update December 2017: dicts retaining insertion order is guaranteed for Python 3.7" – jrc Sep 02 '20 at 08:48
  • 7
    The linked answer says `dict` retains **_insertion_** order, but it doesn't necessarily retain order after operations on its contents. It also points out several features that `OrderedDict` has, but `dict` does not. `dict` may be sufficient for the specific problem posed in this question, but one shouldn't take `dict`'s support for retaining insertion order to mean that it is a **_complete_** replacement for `OrderedDict`. – Mr. Lance E Sloan May 24 '22 at 14:52
252

There is an ordered set (possible new link) recipe for this which is referred to from the Python 2 Documentation. This runs on Py2.6 or later and 3.0 or later without any modifications. The interface is almost exactly the same as a normal set, except that initialisation should be done with a list.

OrderedSet([1, 2, 3])

This is a MutableSet, so the signature for .union doesn't match that of set, but since it includes __or__ something similar can easily be added:

@staticmethod
def union(*sets):
    union = OrderedSet()
    union.union(*sets)
    return union

def union(self, *sets):
    for set in sets:
        self |= set
LondonRob
  • 73,083
  • 37
  • 144
  • 201
Casebash
  • 114,675
  • 90
  • 247
  • 350
  • 6
    I selected my own answer because the reference from the documentation makes this close to an official answer – Casebash Dec 10 '10 at 00:59
  • 58
    The interface is NOT exactly the same as the normal set object, many essential methods are missing such as `update`, `union`, `intersection`. – xApple Dec 14 '12 at 12:48
  • 5
    FYI, I noticed that a [slightly modified version](https://github.com/LuminosoInsight/ordered-set) of the [recipe cited in this answer](http://code.activestate.com/recipes/576694/) has been [added to PyPi](https://github.com/LuminosoInsight/ordered-set) as "ordered-set" – Geoffrey Hing Feb 24 '14 at 16:35
  • 8
    I'm pretty sure you're not allowed to have two methods both called `union` in the same class. The last one will "win" and the first one will fail to exist at runtime. This is because `OrderedSet.union` (no parens) has to refer to a *single* object. – Kevin Dec 05 '14 at 17:38
  • 4
    There is also "orderedset" package which is based on the same recipe but implemented in Cython -- https://pypi.python.org/pypi/orderedset . – mbdevpl Aug 31 '16 at 11:17
  • See a below answer: https://stackoverflow.com/a/53657523/3559330. Dicts preserve order in Python 3.7+. Otherwise, use OrderedDict. – mattyb Apr 07 '21 at 23:42
173

Update: This answer is obsolete as of Python 3.7. See jrc's answer above for a better solution. Will keep this answer here only for historical reasons.


An ordered set is functionally a special case of an ordered dictionary.

The keys of a dictionary are unique. Thus, if one disregards the values in an ordered dictionary (e.g. by assigning them None), then one has essentially an ordered set.

As of Python 3.1 and 2.7 there is collections.OrderedDict. The following is an example implementation of an OrderedSet. (Note that only few methods need to be defined or overridden: collections.OrderedDict and collections.MutableSet do the heavy lifting.)

import collections

class OrderedSet(collections.OrderedDict, collections.MutableSet):

    def update(self, *args, **kwargs):
        if kwargs:
            raise TypeError("update() takes no keyword arguments")

        for s in args:
            for e in s:
                 self.add(e)

    def add(self, elem):
        self[elem] = None

    def discard(self, elem):
        self.pop(elem, None)

    def __le__(self, other):
        return all(e in other for e in self)

    def __lt__(self, other):
        return self <= other and self != other

    def __ge__(self, other):
        return all(e in self for e in other)

    def __gt__(self, other):
        return self >= other and self != other

    def __repr__(self):
        return 'OrderedSet([%s])' % (', '.join(map(repr, self.keys())))

    def __str__(self):
        return '{%s}' % (', '.join(map(repr, self.keys())))
    
    difference = property(lambda self: self.__sub__)
    difference_update = property(lambda self: self.__isub__)
    intersection = property(lambda self: self.__and__)
    intersection_update = property(lambda self: self.__iand__)
    issubset = property(lambda self: self.__le__)
    issuperset = property(lambda self: self.__ge__)
    symmetric_difference = property(lambda self: self.__xor__)
    symmetric_difference_update = property(lambda self: self.__ixor__)
    union = property(lambda self: self.__or__)
Stephan202
  • 59,965
  • 13
  • 127
  • 133
  • 1
    @Casebash: yes, one may want to define a class `OrderedSet` which subclasses `OrderedDict` and `abc.Set` and then define `__len__`, `__iter__` and `__contains__`. – Stephan202 Oct 31 '09 at 11:12
  • 1
    @Stephan202: Regrettably, the collection ABCs live in `collections`, but otherwise a good suggestion – u0b34a0f6ae Oct 31 '09 at 14:58
  • @kaizer.se: right you are. I now posted an example implementation. Turns out my previous comment was not completely correct, but the posted code should speak for itself. – Stephan202 Nov 01 '09 at 15:57
  • 5
    This is true, but you do have a lot of wasted space as a result, which leads to suboptimal performance. – Daniel Kats Oct 03 '12 at 15:11
  • 5
    An addition; collections.OrderedDict is also available in python 2.7. – Nurbldoff Sep 18 '13 at 12:11
  • 5
    Doing `OrderedSet([1,2,3])` raises a TypeError. How does the constructor even work? Missing usage example. – xApple Apr 28 '17 at 13:06
  • 4
    This answer needs to be rewritten to: (1) support initialization using a list of tuple, (2) use `dict` (since it is now ordered) via composition rather than inheritance, and (3) use `collections.abc.MutableSet`. – Asclepius Apr 19 '20 at 00:58
  • 1
    `__sub__` etc are not normally defined. Needs to be imported or referenced differently. – Torxed Jan 02 '21 at 16:21
  • How does this answer have so many upvotes? Did this work in Python 2? As @Torxed wrote, this results in `NameError: name '__sub__' is not defined` (Python 3.7 and 3.8). – Socob Mar 15 '21 at 18:44
  • @Socob I'm sure it worked back then. I haven't done any serious development in Python for many years now, so can't comment on the current state of this code, sorry. – Stephan202 Mar 15 '21 at 18:56
  • 1
    @Stephan202 It still works, there's just some changes since Py2. But this is expected, as this is a bit niche and out of the ordinary development one would do :) Just left a comment to point people in the right direction, don't even remember how I fixed it hehe. – Torxed Mar 15 '21 at 21:15
  • 1
    @Torxed OK, no, this didn’t work, not even in Python 2. Someone recently edited the answer (https://stackoverflow.com/revisions/1653978/5), which broke the code. I’ve submitted an edit to revert this. – Socob Mar 16 '21 at 13:41
  • @Socob tnx! I approved your edit (minus the dropped empty line; stylistic preference ;) ) – Stephan202 Mar 16 '21 at 16:42
  • 1
    Gotta agree with @xApple here, I'm getting a `TypeError: 'int' object is not iterable` when trying the most basic use-case of `OrderedSet([1, 3, 2])` – JamesTheAwesomeDude May 12 '21 at 04:33
59

Implementations on PyPI

While others have pointed out that there is no built-in implementation of an insertion-order preserving set in Python (yet), I am feeling that this question is missing an answer which states what there is to be found on PyPI.

There are the packages:

Some of these implementations are based on the recipe posted by Raymond Hettinger to ActiveState which is also mentioned in other answers here.

Some differences

  • ordered-set (version 1.1)
  • advantage: O(1) for lookups by index (e.g. my_set[5])
  • oset (version 0.1.3)
  • advantage: O(1) for remove(item)
  • disadvantage: apparently O(n) for lookups by index

Both implementations have O(1) for add(item) and __contains__(item) (item in my_set).

Asclepius
  • 57,944
  • 17
  • 167
  • 143
Daniel K
  • 2,977
  • 2
  • 24
  • 23
  • 3
    A new contender is [collections_extended.setlist](http://stackoverflow.com/a/28052907/1031434). Functions like `set.union` don't work on it though, even though it inherits `collections.abc.Set`. – Tim Diels Mar 16 '16 at 23:20
  • 5
    `OrderedSet` now supports [`remove`](http://orderedset.readthedocs.org/en/latest/orderedset.html#orderedset.OrderedSet.remove) – warvariuc Mar 19 '16 at 07:22
  • There is also `SortedSet` from sortedcontainers 2.3.0 with a bunch of other sorted stuff. – ceprio Apr 28 '21 at 21:11
51

I can do you one better than an OrderedSet: boltons has a pure-Python, 2/3-compatible IndexedSet type that is not only an ordered set, but also supports indexing (as with lists).

Simply pip install boltons (or copy setutils.py into your codebase), import the IndexedSet and:

>>> from boltons.setutils import IndexedSet
>>> x = IndexedSet(list(range(4)) + list(range(8)))
>>> x
IndexedSet([0, 1, 2, 3, 4, 5, 6, 7])
>>> x - set(range(2))
IndexedSet([2, 3, 4, 5, 6, 7])
>>> x[-1]
7
>>> fcr = IndexedSet('freecreditreport.com')
>>> ''.join(fcr[:fcr.index('.')])
'frecditpo'

Everything is unique and retained in order. Full disclosure: I wrote the IndexedSet, but that also means you can bug me if there are any issues. :)

NOhs
  • 2,780
  • 3
  • 25
  • 59
Mahmoud Hashemi
  • 2,655
  • 30
  • 19
  • Indexing does not work when negative indexes are supplied. For instance, this s[-4:-1] returns IndexedSet([]) on a very non-empty set. – darlove Jan 01 '21 at 11:31
  • 1
    @darlove Not sure what version you're on but negative indexes are supported and your supplied case does not reproduce on the issue you opened: https://github.com/mahmoud/boltons/issues/274 – Mahmoud Hashemi Jan 06 '21 at 22:45
27

If you're using the ordered set to maintain a sorted order, consider using a sorted set implementation from PyPI. The sortedcontainers module provides a SortedSet for just this purpose. Some benefits: pure-Python, fast-as-C implementations, 100% unit test coverage, hours of stress testing.

Installing from PyPI is easy with pip:

pip install sortedcontainers

Note that if you can't pip install, simply pull down the sortedlist.py and sortedset.py files from the open-source repository.

Once installed you can simply:

from sortedcontainers import SortedSet
help(SortedSet)

The sortedcontainers module also maintains a performance comparison with several alternative implementations.

For the comment that asked about Python's bag data type, there's alternatively a SortedList data type which can be used to efficiently implement a bag.

GrantJ
  • 8,162
  • 3
  • 52
  • 46
  • Note that the `SortedSet` class there requires members to be comparable and hashable. – gsnedders Nov 24 '14 at 19:28
  • 7
    @gsnedders The builtins `set` and `frozenset` also require elements to be hashable. The comparable constraint is the addition for `SortedSet`, but it's also an obvious constraint. – gotgenes Jan 29 '15 at 19:23
  • 2
    As the name suggests, this does not maintain order. It is nothing but sorted(set([sequence])) which makes better? – ldmtwo Nov 06 '18 at 00:32
  • @ldmtwo I'm not sure which you're referring to but just to be clear, [SortedSet](http://www.grantjenks.com/docs/sortedcontainers/sortedset.html) as part of [Sorted Containers](http://www.grantjenks.com/docs/sortedcontainers/) does maintain sorted order. – GrantJ Nov 06 '18 at 17:50
  • 3
    @GrantJ - It is the difference between whether it maintains _insertion_ order or _sort_ order. Most of the other answers are regarding insertion order. I think you are already aware of this based on your first sentence, but it's probably what ldmtwo is saying. – Justin Apr 10 '19 at 14:00
  • This one is available by default on leetcode – Eduardo Feb 12 '22 at 01:14
16

As other answers mention, as for python 3.7+, the dict is ordered by definition. Instead of subclassing OrderedDict we can subclass abc.collections.MutableSet or typing.MutableSet using the dict's keys to store our values.

import itertools
import typing

T = typing.TypeVar("T")

class OrderedSet(typing.MutableSet[T]):
    """A set that preserves insertion order by internally using a dict."""

    def __init__(self, iterable: typing.Iterator[T]):
        self._d = dict.fromkeys(iterable)

    def add(self, x: T) -> None:
        self._d[x] = None

    def discard(self, x: T) -> None:
        self._d.pop(x, None)

    def __contains__(self, x: object) -> bool:
        return self._d.__contains__(x)

    def __len__(self) -> int:
        return self._d.__len__()

    def __iter__(self) -> typing.Iterator[T]:
        return self._d.__iter__()

    def __str__(self):
        return f"{{{', '.join(str(i) for i in self)}}}"

    def __repr__(self):
        return f"<OrderedSet {self}>"

Then just:

x = OrderedSet([1, 2, -1, "bar"])
x.add(0)
assert list(x) == [1, 2, -1, "bar", 0]

I added this code, with some tests, in a small library, so anyone can just pip install it.

bustawin
  • 684
  • 7
  • 11
  • 2
    Don't use this as-is. `discard` should never ever raise a `KeyError`. Also note that this doesn't provide a sensible `__repr__` – Jason Forbes Mar 22 '21 at 22:50
  • @JasonForbes You are right —in fact we addressed your comments in the linked repo. So I just brought those fixes in this answer. Thank you for pointing it out! :-) – bustawin Mar 25 '21 at 22:10
12

In case you're already using pandas in your code, its Index object behaves pretty like an ordered set, as shown in this article.

Examples from the article:

indA = pd.Index([1, 3, 5, 7, 9])
indB = pd.Index([2, 3, 5, 7, 11])

indA & indB  # intersection
indA | indB  # union
indA - indB  # difference
indA ^ indB  # symmetric difference
Berislav Lopac
  • 16,656
  • 6
  • 71
  • 80
  • Can you include an example in this answer? Links tend to be broken after some time. – Alechan Apr 11 '20 at 16:31
  • 1
    for the difference between sets, you actually need to use `indA.difference(indB)`, the minus sign performs standard subtraction – gg349 Apr 28 '20 at 15:22
  • 3
    It's important to note that `pd.Index` allows for duplicate elements, which one would not expect from an actual Python `set`. – jfaccioni May 24 '21 at 13:30
10

A little late to the game, but I've written a class setlist as part of collections-extended that fully implements both Sequence and Set

>>> from collections_extended import setlist
>>> sl = setlist('abracadabra')
>>> sl
setlist(('a', 'b', 'r', 'c', 'd'))
>>> sl[3]
'c'
>>> sl[-1]
'd'
>>> 'r' in sl  # testing for inclusion is fast
True
>>> sl.index('d')  # so is finding the index of an element
4
>>> sl.insert(1, 'd')  # inserting an element already in raises a ValueError
ValueError
>>> sl.index('d')
4

GitHub: https://github.com/mlenzen/collections-extended

Documentation: http://collections-extended.lenzm.net/en/latest/

PyPI: https://pypi.python.org/pypi/collections-extended

Michael Lenzen
  • 822
  • 8
  • 7
9

There's no OrderedSet in official library. I make an exhaustive cheatsheet of all the data structure for your reference.

DataStructure = {
    'Collections': {
        'Map': [
            ('dict', 'OrderDict', 'defaultdict'),
            ('chainmap', 'types.MappingProxyType')
        ],
        'Set': [('set', 'frozenset'), {'multiset': 'collection.Counter'}]
    },
    'Sequence': {
        'Basic': ['list', 'tuple', 'iterator']
    },
    'Algorithm': {
        'Priority': ['heapq', 'queue.PriorityQueue'],
        'Queue': ['queue.Queue', 'multiprocessing.Queue'],
        'Stack': ['collection.deque', 'queue.LifeQueue']
        },
    'text_sequence': ['str', 'byte', 'bytearray']
}
fhdrsdg
  • 10,297
  • 2
  • 41
  • 62
AbstProcDo
  • 19,953
  • 19
  • 81
  • 138
  • Some odd things in this cheatsheet: according to [collections.abc](https://docs.python.org/3/library/collections.abc.html#collections-abstract-base-classes), Sequences are Collections, not a sibling. And iterator does not support indexing, so shouldn't be on the same group as lists and tuples. Also, all text_sequences are also Sequence – MestreLion Oct 30 '21 at 09:28
5

As others have said, OrderedDict is a superset of an ordered set in terms of functionality, but if you need a set for interacting with an API and don't need it to be mutable, OrderedDict.keys() is actually an implementation abc.collections.Set:

import random
from collections import OrderedDict, abc

a = list(range(0, 100))
random.shuffle(a)

# True
a == list(OrderedDict((i, 0) for i in a).keys())

# True
isinstance(OrderedDict().keys(), abc.Set)   

The caveats are immutability and having to build up the set like a dict, but it's simple and only uses built-ins.

David Ehrmann
  • 7,366
  • 2
  • 31
  • 40
2

The ParallelRegression package provides a setList( ) ordered set class that is more method-complete than the options based on the ActiveState recipe. It supports all methods available for lists and most if not all methods available for sets.

RichardB
  • 114
  • 9
0

There is a pip library that does this:

pip install ordered-set

Then you can use it:

from ordered_set import OrderedSet
Watchdog101
  • 700
  • 6
  • 19
0

Just use pd.unique from pandas - does exactly what you need!

>>> import pandas as pd
>>> pd.unique([3, 1, 4, 5, 2, 2])
array([3, 1, 4, 5, 2])
Xoel
  • 318
  • 4
  • 15
-3

This answer is for completeness sake. If the length of your set is small, and your code is single-threaded, a list could work just fine as it's implicitly ordered.

if not new_item in my_list:
    my_list.append(new_item)

If using this approach:

  • To append or remove an item, first check for presence as in the code above.
  • To compare equality, use set(my_list).

Checking for presence in a list of course has an O(n) complexity, but this could be acceptable for a small list, especially if high performance isn't required.

Asclepius
  • 57,944
  • 17
  • 167
  • 143
Loïc N.
  • 353
  • 3
  • 17
  • 2
    The main issue with this approach is that adding runs in O(n). Meaning it gets slower with big lists. Python's built-in sets are very good at making adding elements faster. But for simple use-cases, it certainly does work! – Draconis Nov 09 '18 at 04:19
  • This answer should not be deleted because this approach works acceptably for small lists where the best performance isn't required. – Asclepius Jul 14 '23 at 18:05
-6

For many purposes simply calling sorted will suffice. For example

>>> s = set([0, 1, 2, 99, 4, 40, 3, 20, 24, 100, 60])
>>> sorted(s)
[0, 1, 2, 3, 4, 20, 24, 40, 60, 99, 100]

If you are going to use this repeatedly, there will be overhead incurred by calling the sorted function so you might want to save the resulting list, as long as you're done changing the set. If you need to maintain unique elements and sorted, I agree with the suggestion of using OrderedDict from collections with an arbitrary value such as None.

hwrd
  • 431
  • 4
  • 7
  • 48
    The purpose for OrderedSet is to be able to get the items in the order which they where added to the set. You example could maybe called SortedSet... – Periodic Maintenance Feb 21 '13 at 14:01