113

Is there a Python design decision (PEP) that precludes a sorted container from being added to Python?

(OrderedDict is not a sorted container since it is ordered by insertion order.)

smci
  • 32,567
  • 20
  • 113
  • 146
Neil G
  • 32,138
  • 39
  • 156
  • 257
  • 1
    like collections.OrderedDict? – utdemir May 10 '11 at 16:37
  • 1
    It's just faster. O(1) for hashmap vs O(log n) for ordered set. – vartec May 10 '11 at 16:55
  • 25
    @utdmr: OrderedDict is sorted by insertion order — not by an arbitrary key, like a sorted container. – Neil G May 10 '11 at 19:25
  • @NeilG so you mean, it's actually unsorted, because "insertion order", seems to me, is the only order by which elements may end up in a container. – Hi-Angel Jul 21 '19 at 12:53
  • 3
    @Hi-Angel No, that's not what *sorted container* means. [E.g.](https://stackoverflow.com/questions/15582504/is-there-a-sorted-container-in-the-stl) – Neil G Jul 29 '19 at 02:16
  • @NeilG I'm unsure what you wanted to say. The post you refer to tangentially explains that a sorted container is one that sorts elements upon insertion. My point is, if you insert an element, and then don't do the sorting, you end up with "insertion order" in the container. And since I don't know other way for elements to end up in a container, it's sort of "natural order". You don't need to sort the container for elements to be that way. – Hi-Angel Jul 29 '19 at 06:49
  • 2
    "sorted container is one that sorts elements upon insertion". Not exactly: I would say that a sorted container is a container whose interface has efficient sorted (according to an arbitrary key) iteration and search. Your misunderstanding stems from your unusual definition. – Neil G Jul 29 '19 at 07:29
  • It seems now sortedcontainers is installed by default from Python 3.7.5 You can used it directly in coderpad.io and leetcode, without installation. from sortedcontainers import SortedList sl = SortedList(['e', 'a', 'c', 'd', 'b']) print(sl) – Charlie 木匠 Feb 09 '20 at 19:55

6 Answers6

111

There's also a python sortedcontainers module that implements sorted list, dict, and set types. It's very similar to blist but implemented in pure-Python and in most cases faster.

>>> from sortedcontainers import SortedSet
>>> ss = SortedSet([3, 7, 2, 2])
>>> ss
SortedSet([2, 3, 7])

It also has functionality uncommon to other packages:

>>> from sortedcontainers import SortedDict
>>> sd = SortedDict((num, num) for num in range(100000))
>>> sd.iloc[-5] # Lookup the fifth-to-last key.
99995

Disclosure: I am the author of the sortedcontainers module.

khelwood
  • 55,782
  • 14
  • 81
  • 108
GrantJ
  • 8,162
  • 3
  • 52
  • 46
  • 2
    Nice! You might want to consider updating your documentation to specify that the underlying storage is a [rope](http://en.wikipedia.org/wiki/Rope_(computer_science)). – Neil G Mar 21 '14 at 19:36
  • Also, I believe that blist is written in pure Python as well. – Neil G Mar 21 '14 at 19:38
  • 1
    @NeilG Thanks! Couple notes: blist is not written in pure Python. The sorted set, list, and dict types are based on the blist type which is a B+-tree implemented in C. Also, the underlying structure isn't really a rope; it's more similar to a B+-tree but only has one level of nodes. – GrantJ Mar 21 '14 at 23:10
  • Did you click on the wikipedia article I linked? I think you're using a rope in the same way that a lot of string libraries in C++ do. – Neil G Mar 21 '14 at 23:20
  • @NeilG Yes, I read that article but it doesn't work like that. It's simpler. I'm not sure how C++ libraries implement a rope but sortedcontainers can only have one level of nodes. – GrantJ Mar 22 '14 at 03:36
  • I see. You could have used multiple levels if you wanted to. It might actually be more efficient as the number of contiguous blocks becomes large. – Neil G Mar 22 '14 at 03:44
  • 3
    It's actually a great example of how big-O can be misleading. It would probably slow around a trillion elements but most people haven't got a terabyte of memory to worry about that. I tested it into the billions of elements and it was as fast as C-implementations. It also uses much less memory by maintaining such a simple, list-based structure. – GrantJ Mar 22 '14 at 04:02
  • 1
    Yeah, absolutely. It's the same argument they use to justify using this kind of data structure for strings especially long strings that are used in an editor. – Neil G Mar 22 '14 at 04:24
  • 2
    Anyway, thanks for writing this. I'll keep it in mind if I need this data structure. – Neil G Mar 22 '14 at 04:25
  • Hi @GrantJ. How can I prevent creation of instances of each character in a string, when I add the string to a SortedSet, via `update()`? Have just encountered that unintuitive side effect! - Answer seems use add() instead! – JGFMK May 30 '18 at 11:19
  • Also for @GrantJ https://www.reddit.com/r/python3/comments/8n7vz2/are_there_any_free_online_python_3_online_ide/ – JGFMK May 30 '18 at 11:43
  • 1
    Worked it out in the end.. add takes value, update takes iterator, so when I add a set, I iterate over the set adding one at a time. Would be nice to have overloaded method, or massAdd(), something to that effect.. – JGFMK May 30 '18 at 12:56
  • And.. When you do the add(), be sure to create instance with SortedSet() first then do add(ValueToAdd), rather than SortedSet(ValueToAdd)... – JGFMK May 30 '18 at 13:04
  • Also, do you still use Set from typings if you pass out a SortedSet - a la def func() -> Set[int]? Or are there specific typings for SortedSet too? – JGFMK May 30 '18 at 20:44
  • @JGFMK You can initialize a SortedSet using an iterable: ``SortedSet(set('abracadabra'))``. I'm not sure the right answer about typing things. – GrantJ May 30 '18 at 22:32
  • While useful, this doesn't answer the question as it's not in the standard library. – ijoseph Nov 02 '20 at 19:01
  • @GrantJ is your library available as a one file - to include by simple copy-paste, not by a pip? - I would like to use it in an environment, where I can't use pip. I need it for programming contest where I'm allowed to only post one python file, containing both my code, and all available libraries included. – bastiat Mar 14 '21 at 11:58
  • @GrantJ It seems to me that graphs at http://www.grantjenks.com/docs/sortedcontainers/performance.html#id2 are wrong. It seems to me that all of them show linear complexity which is wrong in most cases. Eg SortedDict.__contains__() shows linear time complexity which is true for skip-list, but wrong for AVL-tree, B-tree, RB-tree - contains has O(log(N)) time. – bastiat Mar 14 '21 at 12:06
  • @bastiat I replied to your questions in the duplicate issues opened on the tracker at https://github.com/grantjenks/python-sortedcontainers/issues . Please use the issue tracker for further conversation. – GrantJ Mar 14 '21 at 17:35
95

It's a conscious design decision on Guido's part (he was even somewhat reluctant regarding the addition of the collections module). His goal is to preserve "one obvious way to do it" when it comes to the selection of data types for applications.

The basic concept is that if a user is sophisticated enough to realise that the builtin types aren't the right solution for their problem, then they're also up to the task of finding an appropriate third party library.

Given that list+sorting, list+heapq and list+bisect cover many of the use cases that would otherwise rely on inherently sorted data structures, and packages like blist exist, there isn't a huge drive to add more complexity in this space to the standard library.

In some ways, it is similar to the fact that there's no multi-dimensional array in the standard library, instead ceding that task to the NumPy folks.

ncoghlan
  • 40,168
  • 10
  • 71
  • 80
  • 5
    Thanks, I was looking for the motivations behind this design decision. This is exactly the kind of answer I was looking for. My initial instinct wouldn't have been to do things this way, but the argument is very convincing. – Neil G May 11 '11 at 03:48
  • `collections.Counter` can be used as sorted set. Though it may not be efficient. – coderek Nov 16 '17 at 01:35
  • 4
    @coderek: `collections.Counter` is unsorted and not appropriate for representing a sorted set. – user2357112 Jun 19 '18 at 22:56
  • But shouldn't be at least built-in dictionary sorted? The dictionary have to be stored sorted in order to provide fast access to elements, this seems odd to me that when you iterate over it, you still somehow end up with unsorted items. – Hi-Angel Jul 21 '19 at 12:48
  • 4
    @Hi-Angel `dict` is a hash table. – Neil G Jul 22 '19 at 20:44
  • Note that as of Python 3.7.0, builtin dictionaries are guaranteed to be insertion order preserving (i.e. iteration and string conversion uses insertion ordering, rather than an arbitrary implementation dependent order). – ncoghlan Oct 12 '21 at 07:06
12

There is also the blist module that contains a sortedset data type:

sortedset(iterable=(), key=None)

>>> from blist import sortedset
>>> my_set = sortedset([3,7,2,2])
sortedset([2, 3, 7]
Adrian
  • 1,837
  • 1
  • 20
  • 21
6

Not exactly a "sorted container", but you might be interested in the standard library's bisect module, which "provides support for maintaining a list in sorted order without having to sort the list after each insertion".

Steven
  • 28,002
  • 5
  • 61
  • 51
3

There is a heapq in the standard library, it is not exactly sorted, but kind of. There is also a blist package, but it is not in the standard library.

abbot
  • 27,408
  • 6
  • 54
  • 57
-7

Python lists are ordered. If you sort them, they stay that way. In Python 2.7 an OrderedDict type was added to maintain an explicitly ordereded dictionary.

Python also has sets (a collection in which the members must be unique), but by definition they are unordered. Sorting a set just returns a list.

jathanism
  • 33,067
  • 9
  • 68
  • 86
  • 11
    Thanks for taking the time to answer. OrderedDict is sorted by insertion order rather than by an arbitrary key like a sorted container. set is also not a sorted container. – Neil G May 10 '11 at 19:28
  • 1
    Is btree perhaps what you're looking for? http://stackoverflow.com/questions/628192#628432 – jathanism May 10 '11 at 21:35
  • thanks, btree is exactly the kind of thing I was looking for. I'm going to go with blist since it's in MacPorts and has a bunch of handy data structures. – Neil G May 10 '11 at 21:43