71

I'm referring to the OrderedDict from the collections module, which is an ordered dictionary.

If it has the added functionality of being orderable, which I realize may often not be necessary but even so, are there any downsides? Is it slower? Is it missing any functionality? I didn't see any missing methods.

In short, why shouldn't I always use this instead of a normal dictionary?

Flimm
  • 136,138
  • 45
  • 251
  • 267
temporary_user_name
  • 35,956
  • 47
  • 141
  • 220
  • 1
    In addition, many packages return dicts and using them alongside OrderedDict will likely mess up the order anyway. – sashkello Sep 23 '13 at 02:59
  • 4
    My question is, *why use an OrderedDict*? Why would you need an ordered dictionary? – TerryA Sep 23 '13 at 02:59
  • I'd use OrderedDict ONLY for output formatting. Are there any other uses I'm missing? – sashkello Sep 23 '13 at 03:01
  • 1
    @Haidro, [an example](http://docs.python.org/3/library/inspect.html#inspect.Signature.parameters) from the standard library. – fjarri Sep 23 '13 at 03:16
  • 2
    If your only purpose for OrderedDict is for formatting output (presumably sorting keys), just use `for key in sorted(dictvar): print (key, dictvar[key])`. OrderedDict preserves order of insertion, not order of keys. – PaulMcG Sep 23 '13 at 07:04
  • @Wooble: Please post answers as answers, not as comments. – Ethan Furman Sep 23 '13 at 20:27
  • @Haidro: One use is for building simple file-handling utility programs. I've used an OrderedDict to store file attributes indexed by filename, ordered alphabetically. Using an OrderedDict means I don't need to re-sort the dictionary at every display update. It's a minor difference, but it made the GUI a bit snappier. – JS. Oct 02 '13 at 18:09
  • see also: https://stackoverflow.com/questions/50872498/will-ordereddict-become-redundant-in-python-3-7 – Chris_Rands Jan 25 '19 at 16:29
  • Dicts are ordered on CPython 3.6 and all other Python implementations starting with Python 3.7, this question is kind of out of date now. – Boris Verkhovskiy Jan 17 '20 at 02:29

4 Answers4

157

OrderedDict is a subclass of dict, and needs more memory to keep track of the order in which keys are added. This isn't trivial. The implementation adds a second dict under the covers, and a doubly-linked list of all the keys (that's the part that remembers the order), and a bunch of weakref proxies. It's not a lot slower, but at least doubles the memory over using a plain dict.

But if it's appropriate, use it! That's why it's there :-)

How it works

The base dict is just an ordinary dict mapping keys to values - it's not "ordered" at all. When a <key, value> pair is added, the key is appended to a list. The list is the part that remembers the order.

But if this were a Python list, deleting a key would take O(n) time twice over: O(n) time to find the key in the list, and O(n) time to remove the key from the list.

So it's a doubly-linked list instead. That makes deleting a key constant (O(1)) time. But we still need to find the doubly-linked list node belonging to the key. To make that operation O(1) time too, a second - hidden - dict maps keys to nodes in the doubly-linked list.

So adding a new <key, value> pair requires adding the pair to the base dict, creating a new doubly-linked list node to hold the key, appending that new node to the doubly-linked list, and mapping the key to that new node in the hidden dict. A bit over twice as much work, but still O(1) (expected case) time overall.

Similarly, deleting a key that's present is also a bit over twice as much work but O(1) expected time overall: use the hidden dict to find the key's doubly-linked list node, delete that node from the list, and remove the key from both dicts.

Etc. It's quite efficient.

Tim Peters
  • 67,464
  • 13
  • 126
  • 132
  • 3
    Nice answer, but we need a link to read further, where from you retried this information. – Grijesh Chauhan Sep 23 '13 at 03:07
  • 32
    @GrijeshChauhan, I read the source code - I'm a core Python developer, so that's how I answer most questions **I** have - LOL ;-) You can find the code in `Lib/collections/__init__.py` in your Python source tree. – Tim Peters Sep 23 '13 at 03:09
  • 76
    Wait...YOU'RE THE GUY WHO WROTE TIMSORT!!! Unexpected descent from python heaven to answer my lowly question. THANKS! – temporary_user_name Sep 23 '13 at 03:21
  • 13
    LOL! You're very welcome, @Aerovistae - it was a worthy question ;-) – Tim Peters Sep 23 '13 at 03:23
  • 10
    I find when I tell people "you can find the code in your Python source tree" they never look, but when I [link to the hg repo](http://hg.python.org/cpython/file/3.3/Lib/collections/__init__.py#!19) they sometimes do. (Usually only when reading the source leads them to a question that's over my head.) – abarnert Sep 26 '13 at 00:17
  • @TimPeters you should take a look here http://stackoverflow.com/questions/19351065/how-is-the-python-grammar-used-internally :) – temporary_user_name Oct 13 '13 at 22:32
  • 6
    @GrijeshChauhan Go to your python interpreter, type `import this` then press enter, the guy who wrote it, is the guy who answered this question. – Games Brainiac May 17 '14 at 19:01
  • Hi Tim, can you explain further upon `It's not a lot slower, but at least doubles the memory over using a plain dict`? I have asked a [question on the same here](http://stackoverflow.com/q/25056387/1860929) – Anshul Goyal Jul 31 '14 at 14:09
  • @TimPeters What if I give the OrderedDict an ordering scheme other than the default (time of of insertion)? For example, [sorted alphanumerically by key](https://docs.python.org/2/library/collections.html#ordereddict-examples-and-recipes). Then I believe adding will take `O(n)` at least, because the key must now be added to the *sorted* double-linked list, which I think is done by traversing and comparing to each node's key. – onepiece Jul 27 '15 at 06:51
  • @onepiece: OrderedDicts are ordered by insertion order so there's no way to directly insert something in the middle. To achieve this, you'll need to rebuild the entire thing from scratch in the desired order to insert one (or more) items into it. I suppose you could derive your own dict subclass that allowed insertions, but that would require updating the internals maintained by base class (like the hidden dict and doubly-linked list). – martineau Feb 28 '16 at 12:05
  • 1
    @onepiece there are many algorithms devoted specifically to keeping sorted lists (for searching), such as red-black trees, which are used to create dicts. They 'sort on insertion,' and their running time is usually around O(log n). There's also my favorite, the fiendishly clever https://pypi.python.org/pypi/sortedcontainers which takes advantage of TimPeters sorting algorithm over lists to accomplish this. – std''OrgnlDave Aug 14 '17 at 07:49
  • 1
    Dicts are ordered on CPython 3.6 and all other Python implementations starting with Python 3.7. – Boris Verkhovskiy Jan 17 '20 at 02:29
  • Why not storing in the doubly linked list though? It seems the base dictionary is not necessary. – johan Jan 14 '22 at 01:50
  • 1
    @johan, the base dictionary is needed to support `O(1)` lookup time. Looking up a key in a bare doubly linked list would take time linear in the length of the list. – Tim Peters Jan 14 '22 at 01:56
  • @timpeters so asking for value at index x would be O(n) because it will traverse the double linked list until it finds the key? – TheLogicGuy Jan 14 '22 at 16:19
  • @TheLogicGuy No, `get` operation will happen on the base `unordered` dict taking O(1) time. The potential `O(n)` time is taken if you have to `delete` a node. That is where the second hidden dict comes in. You simply delete from the base dict. You find the node in the LL via the hidden dict. Delete that node from the LL. Delete that entry from the hidden dict. Done! – piepi Jan 29 '22 at 23:26
  • @TimPeters Why not have the node in the base dict itself thus eliminating the hidden dict? If the delete operation is slow, what is the alternative? Most LRUs seem to use a hashmap and a doubly linked list. – piepi Jan 29 '22 at 23:29
  • 1
    @piepi, don't mean to ignore you, but I have no more interest in this question, which is about `collections.OrderedDict`. Starting several releases ago, the built-in dict type is ordered now, It uses an entirely different implementation, with no "hidden dict". – Tim Peters Jan 30 '22 at 00:38
9

Since Python 3.7, all dictionaries are guaranteed to be ordered. The Python contributors determined that switching to making dict ordered would not have a negative performance impact. I don't know how the performance of OrderedDict compares to dict in Python >= 3.7, but I imagine they would be comparable since they are both ordered.

Note that there are still differences between the behaviour of OrderedDict and dict. See also: Will OrderedDict become redundant in Python 3.7?

Flimm
  • 136,138
  • 45
  • 251
  • 267
7

multithreading

if your dictionary is accessed from multiple threads without a lock, especially as a synchronisation point.

vanilla dict operations are atomic, and any type extended in Python is not.

In fact, I'm not even certain OrderedDict is thread-safe (without a lock), although I cannot discount the possibility that it was very carefully coded and satisfies definition of reentrancy.

lesser devils

memory usage if you create tons of these dictionaries

cpu usage if all your code does is munge these dictionaries

Dima Tisnek
  • 11,241
  • 4
  • 68
  • 120
3

why shouldn't I always use this instead of a normal dictionary

In Python 2.7, normal OrderedDict usage will create reference cycles. So any use of OrderedDict requires the garbage collector to be enabled in order to free the memory. Yes, the garbage collector is on by default in cPython, but disabling it has its uses.

e.g. With cPython 2.7.14

from __future__ import print_function

import collections
import gc

if __name__ == '__main__':
    d = collections.OrderedDict([('key', 'val')])
    gc.collect()
    del d
    gc.set_debug(gc.DEBUG_LEAK)
    gc.collect()
    for i, obj in enumerate(gc.garbage):
        print(i, obj)

outputs

gc: collectable <list 00000000033E7908>
gc: collectable <list 000000000331EC88>
0 [[[...], [...], 'key'], [[...], [...], 'key'], None]
1 [[[...], [...], None], [[...], [...], None], 'key']

Even if you just create an empty OrderedDict (d = collections.OrderedDict()) and don't add anything to it, or you explicitly try to clean it up by calling the clear method (d.clear() before del d), you will still get one self-referencing list:

gc: collectable <list 0000000003ABBA08>
0 [[...], [...], None]

This seems to have been the case since this commit removed the __del__ method in order to prevent the potential for OrderedDict to cause uncollectable cycles, which are arguably worse. As noted in the changelog for that commit:

Issue #9825: removed __del__ from the definition of collections.OrderedDict. This prevents user-created self-referencing ordered dictionaries from becoming permanently uncollectable GC garbage. The downside is that removing __del__ means that the internal doubly-linked list has to wait for GC collection rather than freeing memory immediately when the refcnt drops to zero.


Note that in Python 3, the fix for the same issue was made differently and uses weakref proxies to avoid cycles:

Issue #9825: Using __del__ in the definition of collections.OrderedDict made it possible for the user to create self-referencing ordered dictionaries which become permanently uncollectable GC garbage. Reinstated the Py3.1 approach of using weakref proxies so that reference cycles never get created in the first place.

Day
  • 9,465
  • 6
  • 57
  • 93