219

I would like to combine OrderedDict() and defaultdict() from collections in one object, which shall be an ordered, default dict.
Is this possible?

martineau
  • 119,623
  • 25
  • 170
  • 301
jlconlin
  • 14,206
  • 22
  • 72
  • 105
  • 5
    [I wonder why you can't just create a class that inherits `OrderedDict ` and `defaultdict`?](http://stackoverflow.com/q/27712226/1484957a0) – drs Dec 30 '14 at 20:39
  • @drs see my answer below, which does exactly that: http://stackoverflow.com/a/35968897/1644561 – avyfain Mar 13 '16 at 10:00
  • 3
    Even though you've already accepted a solution, you might want to check-out the somewhat simpler `OrderedDefaultdict` class I wrote for this [answer](https://stackoverflow.com/questions/4126348/how-do-i-rewrite-this-function-to-implement-ordereddict/4127426#4127426). – martineau May 31 '11 at 21:39
  • 2
    I understand that from Python 3.7 onwards the insertion order is maintained for anything that inherits from the regular `dict` - that includes the `defaultdict`. – Peter Kilczuk May 03 '20 at 21:11

10 Answers10

99

The following (using a modified version of this recipe) works for me:

from collections import OrderedDict, Callable

class DefaultOrderedDict(OrderedDict):
    # Source: http://stackoverflow.com/a/6190500/562769
    def __init__(self, default_factory=None, *a, **kw):
        if (default_factory is not None and
           not isinstance(default_factory, Callable)):
            raise TypeError('first argument must be callable')
        OrderedDict.__init__(self, *a, **kw)
        self.default_factory = default_factory

    def __getitem__(self, key):
        try:
            return OrderedDict.__getitem__(self, key)
        except KeyError:
            return self.__missing__(key)

    def __missing__(self, key):
        if self.default_factory is None:
            raise KeyError(key)
        self[key] = value = self.default_factory()
        return value

    def __reduce__(self):
        if self.default_factory is None:
            args = tuple()
        else:
            args = self.default_factory,
        return type(self), args, None, None, self.items()

    def copy(self):
        return self.__copy__()

    def __copy__(self):
        return type(self)(self.default_factory, self)

    def __deepcopy__(self, memo):
        import copy
        return type(self)(self.default_factory,
                          copy.deepcopy(self.items()))

    def __repr__(self):
        return 'OrderedDefaultDict(%s, %s)' % (self.default_factory,
                                               OrderedDict.__repr__(self))
Martin Thoma
  • 124,992
  • 159
  • 614
  • 958
Zach Kelling
  • 52,505
  • 13
  • 109
  • 108
  • 3
    Deleted my answer, which was similar in thought process but designed on the fly (and hence needed to implement various other functions). – dr jimbob May 31 '11 at 16:23
  • 3
    @Neil G: You probably should just use the built-in `callable()` function to test `default_factory`. Using `isinstance(default_factory, Callable)` actually requires it to have more than just callability -- see the [docs](http://docs.python.org/library/collections.html?highlight=callable#collections.Callable) -- which is all that's is needed here. – martineau Jun 17 '12 at 17:29
  • @martineau: You're right. I believe `callable` was removed in Python 3.1 and then reinstated in Python 3.2, and I hadn't upgraded yet when I made this edit. Feel free to make the change. – Neil G Jun 17 '12 at 17:45
  • 1
    @Neil G: Actually `callable()` was first removed in Python 3.0 and then brought back in Python 3.2. Anyway, consider changing it yourself if you wish (I like my own answer better anyway ;-). I generally tend to shy away from just hopping in and changing someone else's answer, preferring instead to only make comments as I've done here. – martineau Jun 17 '12 at 18:41
  • 4
    @zeekay: I think you might need to change `self.items()` into `iter(self.items())` inside `__reduce__`. Otherwise, `PicklingError` exception is raised complaining that fifth argument of the `__reduce__` must be an iterator. – max Jul 30 '12 at 21:03
  • @max: Instead of changing it to `iter(self.items())` it would be simpler to use `self.iteritems()`. Also, the type of error you get from not doing either depends on whether you're using the `pickle` or `cPickle` module. With `pickle`, the error message is `AttributeError: 'list' object has no attribute 'next'` for the same reason (because the list object passed isn't an iterator). FWIW, the first element of the returned tuple could also be changed from `type(self)` to a simpler `self.__class__`. – martineau Oct 22 '13 at 19:18
  • 1
    When I `copy.deepcopy()` an instance of this object, I get a maximum recursion depth exception. In `DefaultOrderedDict.__deepcopy__`, my quick fix is to change the argument `copy.deepcopy(self.items())` to `copy.deepcopy(tuple(self.items())`. – chfoo Aug 18 '14 at 01:29
49

Here is another possibility, inspired by Raymond Hettinger's super() Considered Super, tested on Python 2.7.X and 3.4.X:

from collections import OrderedDict, defaultdict

class OrderedDefaultDict(OrderedDict, defaultdict):
    def __init__(self, default_factory=None, *args, **kwargs):
        #in python3 you can omit the args to super
        super(OrderedDefaultDict, self).__init__(*args, **kwargs)
        self.default_factory = default_factory

If you check out the class's MRO (aka, help(OrderedDefaultDict)), you'll see this:

class OrderedDefaultDict(collections.OrderedDict, collections.defaultdict)
 |  Method resolution order:
 |      OrderedDefaultDict
 |      collections.OrderedDict
 |      collections.defaultdict
 |      __builtin__.dict
 |      __builtin__.object

meaning that when an instance of OrderedDefaultDict is initialized, it defers to the OrderedDict's init, but this one in turn will call the defaultdict's methods before calling __builtin__.dict, which is precisely what we want.

avyfain
  • 834
  • 10
  • 18
  • 23
    This answer, despite its elegance and simplicity, doesn't work in Python3. Since both OrderedDict and defaultdict are implemented in C, you get a TypeError, "multiple bases have instance lay-out conflict." That's because the C classes have differing, and incompatible, ideas of how to lay out the internal data structures. The accepted answer above works well in Python3, with a few tiny changes (super().__getitem__(... instead of OrderedDict.__getitem_(... ). I'm using Python3.5. – ivanlan May 19 '16 at 16:10
  • 4
    Interesting, this works correctly in Python 3.4.3 Is there any way to see where the TypeError is coming from in the C code? – avyfain May 25 '16 at 20:12
  • 3
    so beautiful. a shame it doesn't work in Python 3. – noɥʇʎԀʎzɐɹƆ Aug 06 '16 at 18:15
  • 14
    As of Python 3.6 this will be unnecessary, as all `dicts`, and therefore all `defaultdicts`, will be ordered. I am ok with it not working on 3.5 ;) – avyfain Dec 21 '16 at 01:13
  • 19
    Though `dicts` in CPython 3.6 preserve order, it is an implementation detail not to be relied upon, see http://stackoverflow.com/a/39980548/91243. Use `OrderedDict` if that is what you want. – amjoconn Feb 15 '17 at 19:07
  • It's an implementation detail, but if it’s there to support an official feature of the language now in the spec: ordered `**kwargs`. The quote mentioned in @amjoconn’s comment deserves to be more complete, [read release notes on 3.6 in full](https://docs.python.org/3/whatsnew/3.6.html#new-dict-implementation). Note how frustratingly dissembling the docs are: per _letter_ this “shouldn’t be relied upon”, then in parens it “may” be codified in the future, and if you ask me the _spirit_ of it is “it’s an awesome feature and it’s already in PyPy, just let’s wait before adding this to the spec”. – Anton Strogonoff Jun 02 '17 at 03:46
  • From videos, the consensus from PyCon seems to be moving toward making the ordering standard: https://www.youtube.com/watch?v=66P5FMkWoVU. Nothing official yet. – amjoconn Jul 18 '17 at 14:42
  • 14
    It's now offical Guido approved it. – Fruch Dec 20 '17 at 21:28
  • @Fruch still, dicts don't support "move_to_end" – Erik Aronesty Oct 19 '22 at 15:03
  • @ErikAronesty what's move to end ? and how it's related to the OP question? – Fruch Oct 21 '22 at 07:36
39

If you want a simple solution that doesn't require a class, you can just use OrderedDict.setdefault(key, default=None) or OrderedDict.get(key, default=None). If you only get / set from a few places, say in a loop, you can easily just setdefault.

totals = collections.OrderedDict()

for i, x in some_generator():
    totals[i] = totals.get(i, 0) + x

It is even easier for lists with setdefault:

agglomerate = collections.OrderedDict()

for i, x in some_generator():
    agglomerate.setdefault(i, []).append(x)

But if you use it more than a few times, it is probably better to set up a class, like in the other answers.

Artyer
  • 31,034
  • 3
  • 47
  • 75
29

Here's another solution to think about if your use case is simple like mine and you don't necessarily want to add the complexity of a DefaultOrderedDict class implementation to your code.

from collections import OrderedDict

keys = ['a', 'b', 'c']
items = [(key, None) for key in keys]
od = OrderedDict(items)

(None is my desired default value.)

Note that this solution won't work if one of your requirements is to dynamically insert new keys with the default value. A tradeoff of simplicity.

Update 3/13/17 - I learned of a convenience function for this use case. Same as above but you can omit the line items = ... and just:

od = OrderedDict.fromkeys(keys)

Output:

OrderedDict([('a', None), ('b', None), ('c', None)])

And if your keys are single characters, you can just pass one string:

OrderedDict.fromkeys('abc')

This has the same output as the two examples above.

You can also pass a default value as the second arg to OrderedDict.fromkeys(...).

Taylor D. Edmiston
  • 12,088
  • 6
  • 56
  • 76
  • 2
    Thank you! `od = OrderedDict((k, None) for k in iterable)` – n8henrie Jun 03 '16 at 15:53
  • 1
    This assumes your keys are predefined in some iterable though, so downstream objects would need to be aware that adding a new key requires an initial value. To be more precise, you couldn't assume an initial value for something like: `>>> od = OrderedDefaultDict(int) >>> od['foo'] += 100 OrderedDefaultDict([('foo', 100)])` This case would be correctly handled by a solution like [this one](http://stackoverflow.com/a/35968897/1644561). – avyfain Oct 13 '16 at 22:42
  • @avyfain That's correct. For my use case, it was just the initial data so future inserts of keys not previously defined wasn't relevant. I'll add a note to make the assumption explicit. – Taylor D. Edmiston Oct 13 '16 at 22:45
11

Another simple approach would be to use dictionary get method

>>> from collections import OrderedDict
>>> d = OrderedDict()
>>> d['key'] = d.get('key', 0) + 1
>>> d['key'] = d.get('key', 0) + 1
>>> d
OrderedDict([('key', 2)])
>>> 
snebel29
  • 179
  • 2
  • 6
7

A simpler version of @zeekay 's answer is:

from collections import OrderedDict

class OrderedDefaultListDict(OrderedDict): #name according to default
    def __missing__(self, key):
        self[key] = value = [] #change to whatever default you want
        return value
Neck Beard
  • 383
  • 5
  • 8
7

A simple and elegant solution building on @NickBread. Has a slightly different API to set the factory, but good defaults are always nice to have.

class OrderedDefaultDict(OrderedDict):
    factory = list

    def __missing__(self, key):
        self[key] = value = self.factory()
        return value
F Pereira
  • 1,157
  • 10
  • 9
0

I created slightly fixed and more simplified version of the accepted answer, actual for python 3.7.

from collections import OrderedDict
from copy import copy, deepcopy
import pickle
from typing import Any, Callable


class DefaultOrderedDict(OrderedDict):
    def __init__(
            self,
            default_factory: Callable[[], Any],
            *args,
            **kwargs,
    ):
        super().__init__(*args, **kwargs)
        self.default_factory = default_factory

    def __getitem__(self, key):
        try:
            return super().__getitem__(key)
        except KeyError:
            return self.__missing__(key)

    def __missing__(self, key):
        self[key] = value = self.default_factory()
        return value

    def __reduce__(self):
        return type(self), (self.default_factory, ), None, None, iter(self.items())

    def copy(self):
        return self.__copy__()

    def __copy__(self):
        return type(self)(self.default_factory, self)

    def __deepcopy__(self, memo):
        return type(self)(self.default_factory, deepcopy(tuple(self.items()), memo))

    def __repr__(self):
        return f'{self.__class__.__name__}({self.default_factory}, {OrderedDict(self).__repr__()})'

And, that may be even more important, provided some tests.

a = DefaultOrderedDict(list)

# testing default
assert a['key'] == []
a['key'].append(1)
assert a['key'] == [1, ]

# testing repr
assert repr(a) == "DefaultOrderedDict(<class 'list'>, OrderedDict([('key', [1])]))"

# testing copy
b = a.copy()
assert b['key'] is a['key']
c = copy(a)
assert c['key'] is a['key']
d = deepcopy(a)
assert d['key'] is not a['key']
assert d['key'] == a['key']

# testing pickle
saved = pickle.dumps(a)
restored = pickle.loads(saved)
assert restored is not a
assert restored == a

# testing order
a['second_key'] = [2, ]
a['key'] = [3, ]
assert list(a.items()) == [('key', [3, ]), ('second_key', [2, ])]
Nikolay Prokopyev
  • 1,260
  • 12
  • 22
-2

Inspired by other answers on this thread, you can use something like,

from collections import OrderedDict

class OrderedDefaultDict(OrderedDict):
    def __missing__(self, key):
        value = OrderedDefaultDict()
        self[key] = value
        return value

I would like to know if there're any downsides of initializing another object of the same class in the missing method.

Samarth
  • 119
  • 1
  • 1
  • 4
  • 2
    This is an ordered dict where the default value is always another ordered dict. Not really what the question was about. – Ivan Ivanov Jan 16 '19 at 13:38
-3

i tested the default dict and discovered it's also sorted! maybe it was just a coincidence but anyway you can use the sorted function:

sorted(s.items())

i think it's simpler

Ortal Turgeman
  • 143
  • 4
  • 14