15

In Tim Peter's answer to "Are there any reasons not to use an ordered dictionary", he says

OrderedDict is a subclass of dict.

It's not a lot slower, but at least doubles the memory over using a plain dict.

Now, while going through a particular question, I tried some sample checks using ipython and both of them contradict the earlier reasoning:

  1. both dict and OrderedDict are of same size
  2. operating on an OrderedDict takes easily around 7-8 times more time than operating on a dict (Hence a lot slower)

Can someone explain to me where I'm going wrong in my reasoning?


Create a large Dict and OrderedDict and compare sizes:

import sys
import random
from collections import OrderedDict

test_dict = {}
test_ordered_dict = OrderedDict()

for key in range(10000):
    test_dict[key] = random.random()
    test_ordered_dict[key] = random.random()

sys.getsizeof(test_dict)
786712

sys.getsizeof(test_ordered_dict)
786712

Check time taken for the insertions using %timeit:

import sys
import random
from collections import OrderedDict

def operate_on_dict(r):
    test_dict = {}
    for key in range(r):
        test_dict[key] = random.random()

def operate_on_ordered_dict(r):
    test_ordered_dict = OrderedDict()
    for key in range(r):
        test_ordered_dict[key] = random.random()

%timeit for x in range(100): operate_on_ordered_dict(100)
100 loops, best of 3: 9.24 ms per loop

%timeit for x in range(100): operate_on_dict(100)
1000 loops, best of 3: 1.23 ms per loop
Community
  • 1
  • 1
Anshul Goyal
  • 73,278
  • 37
  • 149
  • 186

1 Answers1

10

I think the problem with size is due to the fact that there's no __sizeof__ method defined in Python 2.X implementation of OrderedDict, so it simply falls back to dict's __sizeof__ method.

To prove this here I've created a class A here which extends list and also added an additional method foo to check if that affects the size.

class A(list):
    def __getitem__(self, k):
        return list.__getitem__(self, k)
    def foo(self):
        print 'abcde'

>>> a = A(range(1000))
>>> b = list(range(1000))

But still same size is returned by sys.getsizeof:

>>> sys.getsizeof(a), sys.getsizeof(b)
(9120, 9120)

Of course A is going to be slow because its methods are running in Python while list's method will run in pure C.

>>> %%timeit
... for _ in xrange(1000):
...     a[_]
... 
1000 loops, best of 3: 449 µs per loop
>>> %%timeit
for _ in xrange(1000):
    b[_]
... 
10000 loops, best of 3: 52 µs per loop

And this seems to be fixed in Python 3 where there's a well defined __sizeof__ method now:

def __sizeof__(self):
    sizeof = _sys.getsizeof
    n = len(self) + 1                       # number of links including root
    size = sizeof(self.__dict__)            # instance dictionary
    size += sizeof(self.__map) * 2          # internal dict and inherited dict
    size += sizeof(self.__hardroot) * n     # link objects
    size += sizeof(self.__root) * n         # proxy objects
    return size
Ashwini Chaudhary
  • 244,495
  • 58
  • 464
  • 504
  • 1
    +1 I think you are right about the missing `__sizeof__`, but your answer doesn't explain Tim's claim that `It's not a *lot* slower`. FWIW, I get the same 7-8x factor of running time whether I choose a range of 100 or 10000 keys. – Anshul Goyal Jul 31 '14 at 11:06
  • @mu無 May be that's what Tim meant by **lot** there. – Ashwini Chaudhary Jul 31 '14 at 11:40
  • @mu無 Better post a comment on his answer. – Ashwini Chaudhary Jul 31 '14 at 13:00
  • [Done](http://stackoverflow.com/questions/18951143/are-there-any-reasons-not-to-use-an-ordered-dictionary/18951209#comment38985313_18951209) – Anshul Goyal Jul 31 '14 at 14:17
  • > also added an additional method foo to check if that affects the size. Adding method does not increase size of instance, as methods are attributes on the class. A more accurate explanation would be by adding a new attribute "foo" in the subclass's `__init__` – pankaj Jun 05 '19 at 16:44