1

The topic is not new and has already been discussed in multiple posts (links at the bottom). However, I felt like the resources are scattered and it is not always clear what the the best approach is. I would also like to introduce some constraints to clearly define the behaviour that I am expecting.

Say we have a nested dictionary with any number of items and arbitrary depth:

d = {"a": {"b": {"c" : 0}},
     "b": {"c" : 1},
     "c": 2}

What is the best way to get its items?

The naive approach is quite cumbersome, especially when there are many nested levels.

>>> d["a"]["b"]["c"]
0

So the first constraint is that the keys of the items to get must be provided as tuples, for example:

key = ("a", "b", "c")

The objective now is to create some function that works as follows:

>>> getitem(d, key)
0

This format can also conveniently be applied directly as the __getitem__ method of a class.

One more constraint: I want the function to fail noisily when it is asked to get a non-existing key.

>>> getitem(d, ("asd",))
...
KeyError: 'asd'

This excludes all solutions that use item getting to vivify the dictionary.

Finally, please provide low-level code if possible. If you know of a package that solves this problem please explain the underlying mechanism.

References

edd313
  • 1,109
  • 7
  • 20

3 Answers3

3

I will propose 5 different solutions to get items in a nested dictionary that meet the criteria. Then, I will compare them based on the performance and readability. Conclusions at the end.

Possible solutions

  1. Use a for loop:
def getitem_for(d, key):
    for level in key:
        d = d[level]
    return d
  1. Use while
def getitem_while(d, key):
    while key:
        d = d[key[0]]
        key = key[1:]
    return d
  1. Use reduce
from functools import reduce
from operator import getitem

def getitem_reduce(d, key):
    return reduce(getitem, key, d)
  1. Use recursion
def getitem_recursive(d, key):
    if len(key) !=  1:
        return getitem_recursive(d[key[0]], key[1:])
    else:
        return d[key[0]]
  1. Finally, we can flatten the dictionary so that its keys are tuples, where each element represents a certain level. To flatten the dictionary:
def flatten(ndict):
    def key_value_pairs(d, key=[]):
        if not isinstance(d, dict):
            yield tuple(key), d
        else:
            for level, d_sub in d.items():
                key.append(level)
                yield from key_value_pairs(d_sub, key)
                key.pop()
    return dict(key_value_pairs(ndict))
>>> fd = flatten(d)
>>> fd
{('a', 'b', 'c'): 0, ('b', 'c'): 1, ('c',): 2}

Getting items is now trivial

>>> fd["a", "b", "c"]
0

Discussion

In terms of readability I find 1, 2, and 3 almost equivalent. Maybe reduce is not as well known as for and while loops, but still results in an elegant and concise one-liner. The recursive solutions 4 and 5 may be more difficult to understand, especially for beginners.

Now performance, here you have the simple speed tests that I ran in a Jupyter notebook on Python 3.8.

%%timeit
getitem_for(d, key)
346 ns ± 17.9 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

%%timeit
getitem_while(d, key)
817 ns ± 67.4 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

%%timeit
getitem_reduce(d, key)
445 ns ± 11.8 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

%%timeit
getitem_recursive(d, key)
1.06 µs ± 69.6 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

%%timeit
df[key]
112 ns ± 3.95 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

The best approach seems to be the flattened dictionary; however, here it is how long it takes to create it from the original one:

%%timeit
flatten(d)
7.96 µs ± 779 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

The recursive function and while loop are definitely to exclude. The for loop and reduce versions are comparable, even though the for loop is faster.

Conclusions

The performance tests that I run are not precise, do not necessarily apply to all nested dictionaries and Python versions. However, they help identify the for loop and reduce versions as good candidates to efficiently get the items of a nested dictionary. All solutions investigated fail noisily when trying to get a key does not exist.

Flat dictionaries are by far superior to all other options, but one must take into account the cost of flattening. This shows that you should prefer flat dictionaries over nested whenever you have control over the source of data.

edd313
  • 1,109
  • 7
  • 20
1

You could use python-benedict (I developed it), it's dict wrapper with many reusable features, including keypath support.

The library code is open-source and available on GitHub: https://github.com/fabiocaccamo/python-benedict

Installation:

pip install python-benedict

Usage:

from benedict import benedict

d = {"a": {"b": {"c" : 0}},
     "b": {"c" : 1},
     "c": 2}

key = ["a", "b", "c"]

b = benedict(d)
print(b[key)) # -> 0
Fabio Caccamo
  • 1,871
  • 19
  • 21
  • Ciao @Fabio, nice project! Could you include more details on the implementation of benedict? I had a look at the source code and found a `get_items` function in keylist_util.py that seems to be responsible for getting items and that uses a for loop. – edd313 Mar 14 '22 at 11:22
  • @edd313 thank you! No need to dig in the core function, all functionalities are available as dict methods, give a look to the README: https://github.com/fabiocaccamo/python-benedict#usage – Fabio Caccamo Mar 14 '22 at 15:04
  • The README is clear and leaves me with no doubt that benedict is a good solution with a straightforward interface. At the same time, I asked my question to specifically understand the best low-level mechanism. I will edit it and clarify. I would really appreciate if you decided to share the base mechanism that benedicts implement. – edd313 Mar 14 '22 at 15:41
  • 1
    @edd313 you can find the core func here: https://github.com/fabiocaccamo/python-benedict/blob/master/benedict/dicts/keylist/keylist_util.py#L52 – Fabio Caccamo Mar 15 '22 at 08:02
1

Short low-level answer (already mentioned):

from functools import reduce
import operator
def getitem(d,key):
    return reduce(operator.getitem, key, d)

d = {"a": {"b": {"c" : 0}},
     "b": {"c" : 1},
     "c": 2}
key = ("a", "b", "c")
getitem(d, key)
0

This is a subclass of UserDict, it is pure Python, compatible with a regular dict, and implements the mechanism above:

If you have a nested dictionary d and you want to access the value at d[1][2][3], you can simply pass the list [1, 2, 3] as the key ( d[[1, 2, 3]] ) to retrieve the desired value.

import operator
from collections import UserDict, defaultdict
from functools import reduce
from pprint import pformat
from copy import deepcopy


def nested_dict():
    """
    Helper function to create a nested defaultdict.
    """
    return defaultdict(nested_dict)


def convert_to_default_dict(di):
    """
    Recursively converts a dictionary to a nested defaultdict.
    """
    if isinstance(di, dict):
        ndi = nested_dict()
        for k, v in di.items():
            ndi[k] = convert_to_default_dict(v)
        return ndi
    return di


def convert_to_normal_dict_simple(di):
    """
    Recursively converts a nested defaultdict back to a normal dictionary.
    """
    if isinstance(di, defaultdict):
        di = {k: convert_to_normal_dict_simple(v) for k, v in di.items()}
    return di


class MultiKeyDict(UserDict):
    """
    A dictionary class that allows accessing elements with nested keys using lists.
    Inherits from UserDict.

    Methods:
        __init__(self, initialdata=None, **kwargs):
            Initializes the MultiKeyDict object with optional initial data.

        __getitem__(self, key):
            Retrieves the value associated with the given key(s) from the nested dictionary.

        __setitem__(self, key, value):
            Sets the value associated with the given key(s) in the nested dictionary.

        __str__(self):
            Returns a string representation of the nested dictionary.

        __repr__(self):
            Returns a string representation of the nested dictionary.

        get(self, key, default=None):
            Retrieves the value associated with the given key(s) from the nested dictionary,
            or returns the default value if the key(s) is not found.

        pop(self, key, default=None):
            Removes and returns the value associated with the given key(s) from the nested dictionary,
            or returns the default value if the key(s) is not found.

        __delitem__(self, key):
            Removes the key(s) and its associated value(s) from the nested dictionary.

        setdefault(self, key, default=None):
            Raises a TypeError indicating that 'setdefault' is not allowed for the MultiKeyDict class.

        to_dict(self):
            Converts the nested dictionary to a normal dictionary and returns it.

        copy(self):
            Creates a deep copy of the MultiKeyDict object and returns it.

        items(self):
            Returns a list of key-value pairs from the nested dictionary.

        keys(self):
            Returns a list of keys from the nested dictionary.

        values(self):
            Returns a list of values from the nested dictionary.

        update(self, other=(), **kwds):
            Updates the nested dictionary with the key-value pairs from another dictionary.

        clear(self):
            Clears all the elements from the nested dictionary.

        reversed(self):
            Returns a reversed iterator of the keys in the nested dictionary.
    """

    def __init__(self, dict=None, /, **kwargs):
        super().__init__(dict,**kwargs)
        self.data = convert_to_default_dict(self.data)

    def __getitem__(self, key, /):
        if isinstance(key, list):
            v = self._get_from_original_iter(keys=key)
            if isinstance(v, defaultdict):
                return convert_to_normal_dict_simple(v)
            return v
        if isinstance(v := self.data[key], defaultdict):
            return convert_to_normal_dict_simple(v)
        return v

    def __setitem__(self, key, value):
        if isinstance(key, list):
            self._set_in_original_iter(key, value)
        else:
            self.data[key] = value

    def __str__(self):
        return pformat(convert_to_normal_dict_simple(self.data), width=1)

    def __repr__(self):
        return self.__str__()

    @staticmethod
    def _convert2dict(d):
        try:
            return convert_to_normal_dict_simple(d)
        except Exception:
            return d

    def get(self, key, default=None):
        v = default
        if not isinstance(key, list):
            if key in self.data:
                v = self.data[key]
        else:
            v = self._get_from_original_iter(key)
        v = MultiKeyDict._convert2dict(v)
        return v

    def pop(self, key, default=None):
        if not isinstance(key, list):
            v = super().pop(key, default)
            v = MultiKeyDict._convert2dict(v)
            return v
        else:
            return self._convert2dict(self._del_and_return(key))

    def _del_and_return(self, key):
        newkey = key[:-1]
        delkey = key[-1]
        h = reduce(operator.getitem, newkey, self.data)
        value1 = h[delkey]
        del h[delkey]
        return value1

    def __delitem__(self, key):
        if not isinstance(key, list):
            super().__delitem__(key)
        else:
            _ = self._del_and_return(key)

    def setdefault(self, key, default=None):
        raise TypeError("setdefault not allowed!")

    def to_dict(self):
        return convert_to_normal_dict_simple(self.data)

    def copy(self):
        return MultiKeyDict(deepcopy(self.data))

    def items(self):
        return self.to_dict().items()

    def keys(self):
        return self.to_dict().keys()

    def values(self):
        return self.to_dict().values()

    def update(self, other=(), /, **kwds):
        super().update(other, **kwds)
        self.data = convert_to_default_dict(self.data)

    def _get_from_original_iter(self, keys):
        return reduce(operator.getitem, keys, self.data)

    def _set_in_original_iter(self, keys, value):
        self._get_from_original_iter(keys[:-1])[keys[-1]] = value

    def clear(self):
        self.data = convert_to_default_dict({})

    def reversed(self):
        return reversed(list(iter(self.keys())))

Here are all the compatibility tests:

dict2 = {2: {"c": 222}, 3: {"d": {3, 6}}}
d = MultiKeyDict(dict2)

d[[1, 3, 4, 5, 67]] = 100
print(d[[1, 3]])
dd = {2: {"c": 222}, 3: {"d": {3, 6}}}
print(f"{list(d)=}")
print(f"{len(d)=}")
print(f"{d[1]=}")
print(f"{d[1][3]=}")
print(f"{d[[1,3]]=}")
d[[23, 4, 5, 323]] = "x"
print(f"""d[[23,4,5,323]] = 'x'={d}""")
print(f"{23 in d=}")
del d[[1, 3]]
print(f"""del d[[1,3]]={d}""")
del d[1]
print(f"""del d[1]={d}""")
di2 = d.copy()
print(f"{di2 == d=}")
print(f"{di2 is d=}")
di2.clear()
print(f"""di2.clear()={di2}""")
print(f"{list(iter(d))=}")
print(f"{d.get(2)=}")
print(f"{d.get([23,4,5])=}")
print(f"{d.items()=}")
print(f"{d.keys()=}")
print(f"{d.pop(3)=}")
print(f"{d.pop([23,4,5])=}")
print(f"""{d.popitem()=}""")
print(f"""after d.popitem={d}""")
dict2 = {2: {"c": 222}, 3: {"d": {3, 6}}, 4: 3, 33: {33: 2}}
d = MultiKeyDict(dict2)
print(f"""{list(d.reversed())=}""")
d.update({4: {44: 4}})
print(f"""d.update...={d}""")
d5 = d | {3: 4}
d |= {3: 4}
print(f"""d |= {{3:4}}={d}""")
print(f'{d.to_dict()=}')





{4: {5: {67: 100}}}
list(d)=[2, 3, 1]
len(d)=3
d[1]={3: {4: {5: {67: 100}}}}
d[1][3]={4: {5: {67: 100}}}
d[[1,3]]={4: {5: {67: 100}}}
d[[23,4,5,323]] = 'x'={1: {3: {4: {5: {67: 100}}}},
 2: {'c': 222},
 3: {'d': {3,
           6}},
 23: {4: {5: {323: 'x'}}}}
23 in d=True
del d[[1,3]]={1: {},
 2: {'c': 222},
 3: {'d': {3,
           6}},
 23: {4: {5: {323: 'x'}}}}
del d[1]={2: {'c': 222},
 3: {'d': {3,
           6}},
 23: {4: {5: {323: 'x'}}}}
di2 == d=True
di2 is d=False
di2.clear()={}
list(iter(d))=[2, 3, 23]
d.get(2)={'c': 222}
d.get([23,4,5])={323: 'x'}
d.items()=dict_items([(2, {'c': 222}), (3, {'d': {3, 6}}), (23, {4: {5: {323: 'x'}}})])
d.keys()=dict_keys([2, 3, 23])
d.pop(3)={'d': {3, 6}}
d.pop([23,4,5])={323: 'x'}
d.popitem()=(2, {'c': 222})
after d.popitem={23: {4: {}}}
list(d.reversed())=[33, 4, 3, 2]
d.update...={2: {'c': 222},
 3: {'d': {3,
           6}},
 4: {44: 4},
 33: {33: 2}}
d |= {3:4}={2: {'c': 222},
 3: 4,
 4: {44: 4},
 33: {33: 2}}
d.to_dict()={2: {'c': 222}, 3: 4, 4: {44: 4}, 33: {33: 2}}

For lazy ones :) : I also made a pip package: pip install mymulti-key-dict

Hans
  • 148
  • 2
  • 7