3

I need to get some values out of some big nested dictionaries. Out of laziness I decided to write a function that recursively calls itself until the last child is found, or the leaf is empty.

Since there are dictionaries popped out and with every new call there's a new dictionary built, I wonder how efficient this is.

Any suggestions?

def recursive_dict_get(item, string, default=False):
    if not isinstance(item, dict):
        return default

    print "called with ", item, "and string", string
    if "." in string:
        attrs = string.split(".")
        parent = attrs.pop(0)
        rest = ".".join(attrs)
        result = item.get(parent, None)
        if result is None:
            return default
        else:
            return recursive_dict_get(item.get(parent, default), rest, default)
    else:
        return item.get(string, default)

---

foo = {
            "1": {
                "2": {
                    "3": {
                        "4":{
                            "5": {
                                "6": {
                                    "7": "juice"
                                }
                            }
                        }
                    }
                }
            }
        }

print recursive_dict_get(foo, "1.2.3.4.5.6.7", False)
print "*" * 3       
print recursive_dict_get(foo, "1.2.3.4.5.6", False)
print "*" * 3
print recursive_dict_get(foo, "1.3", False)
Jay
  • 2,519
  • 5
  • 25
  • 42
  • 1
    I'm confused - are you asking if there's a better way to do this or how to test the efficiency? – thegrinner Aug 20 '13 at 15:30
  • 4
    You aren't building any new dictionaries with your code. – Blender Aug 20 '13 at 15:32
  • 1
    a quick-n-dirty way: `print reduce(dict.get, "1.2.3.4.5.6.7".split('.'), nested_dict)`. See [more general version that also allows to change values](http://stackoverflow.com/q/11918852/4279) – jfs Aug 20 '13 at 16:05

3 Answers3

5

One suggestion I have is to give split() a second argument. You can do something simpler like:

parent, rest = string.split(".", 1)

Other than that, I see no immediate issues with the code.

You can also do this without recursion:

def recursive_dict_get(item, string, default=False):
    for s in string.split('.'):
        if (isinstance(item, dict) and s in item):
            item = item[s]
        else:
            return default
    return item
arshajii
  • 127,459
  • 24
  • 238
  • 287
4

Yes, your implementation is fairly inefficient, even though it isn't building any new dictionaries, bur rather returns potentially a lot of existing ones. Regardless, you could adapt the accepted answer to Access python nested dictionary items via a list of keys to reduce your access function to one line of code. This is similar to what J.F. Sebastian (@jfs) alluded to in his comment. My take on it would be something like this:

def nonrecursive_dict_get(item, key_string, default=False):
    return reduce(lambda d, k: d.get(k, default), key_string.split('.'), item)

print "*" * 3, 'using nonrecursive_dict_get()'
print nonrecursive_dict_get(foo, "1.2.3.4.5.6.7")
print "*" * 3
print nonrecursive_dict_get(foo, "1.2.3.4.5.6")
print "*" * 3
print nonrecursive_dict_get(foo, "1.3")

Update:

Whenever efficiency is a concern, often the best thing to do is run a benchmark of the various approaches. Here's one I've used a number of times:

global_setup = """
    foo = {
            "1": {
                "2": {
                    "3": {
                        "4": {
                            "5": {
                                "6": {
                                    "7": "juice"
                                     }
                                 }
                             }
                         }
                     }
                 }
          }
"""

testcases = {
"jay":
    { 'setup' : """
        def recursive_dict_get(item, string, default=False):
            if not isinstance(item, dict):
                return default
            if "." in string:
                attrs = string.split(".")
                parent = attrs.pop(0)
                rest = ".".join(attrs)
                result = item.get(parent, None)
                if result is None:
                    return default
                else:
                    return recursive_dict_get(item.get(parent, default), rest, default)
            else:
                return item.get(string, default)
                """,
      'code' : """
        recursive_dict_get(foo, "1.2.3.4.5.6.7", False)
        recursive_dict_get(foo, "1.2.3.4.5.6", False)
        recursive_dict_get(foo, "1.3", False)
        """,
    },

"martineau":
    { 'setup' : """
        def nonrecursive_dict_get(nested_dict, key_string, default=False):
            return reduce(lambda d, k: d.get(k, default), key_string.split('.'), nested_dict)
            """,
      'code' : """
        nonrecursive_dict_get(foo, "1.2.3.4.5.6.7", False)
        nonrecursive_dict_get(foo, "1.2.3.4.5.6", False)
        nonrecursive_dict_get(foo, "1.3", False)
        """,
    },

"J.F. Sebastian":
    { 'setup' : """
        # modified to support 'default' keyword argument
        def quick_n_dirty(nested_dict, key_string, default=False):
            reduced = reduce(dict.get, key_string.split('.'), nested_dict)
            return default if reduced is None else reduced
            """,
      'code' : """
        quick_n_dirty(foo, "1.2.3.4.5.6.7", False)
        quick_n_dirty(foo, "1.2.3.4.5.6", False)
        quick_n_dirty(foo, "1.3", False)
        """,
    },

"arshajii":
    { 'setup' : """
        def recursive_dict_get(item, string, default=False):
            for s in string.split('.'):
                if (isinstance(item, dict) and s in item):
                    item = item[s]
                else:
                    return default
            return item
            """,
      'code' : """
        recursive_dict_get(foo, "1.2.3.4.5.6.7", False)
        recursive_dict_get(foo, "1.2.3.4.5.6", False)
        recursive_dict_get(foo, "1.3", False)
        """,
    },

"Brionius":
    { 'setup' : """
        def getVal(d, keys, default):
            keys = keys.split(".")
            for key in keys:
                try:
                    d = d[key]
                except KeyError:
                    return default
            return d
            """,
      'code' : """
        getVal(foo, "1.2.3.4.5.6.7", False)
        getVal(foo, "1.2.3.4.5.6", False)
        getVal(foo, "1.3", False)
        """,
    },
}

import sys
from textwrap import dedent
import timeit
N = 100000
R = 3

# remove leading whitespace from all code fragments
global_setup = dedent(global_setup)
for testcase in testcases.itervalues():
    for label, fragment in testcase.iteritems():
        testcase[label] = dedent(fragment)

timings = [(name,
            min(timeit.repeat(testcases[name]['code'],
                              setup=global_setup + testcases[name]['setup'],
                              repeat=R, number=N)),
           ) for name in testcases]

longest_name = max(len(t[0]) for t in timings)

print('fastest to slowest timings:\n'
      '  ({:,d} calls, best of {:d} repetitions)\n'.format(N, R))

ranked = sorted(timings, key=lambda t: t[1])  # sort by speed (fastest first)
for timing in ranked:
    print("{:>{width}} : {:.6f} secs ({rel:>8.6f}x)".format(
          timing[0], timing[1], rel=timing[1]/ranked[0][1], width=longest_name))

Output:

fastest to slowest timings:
  (100,000 calls, best of 3 repetitions)

J.F. Sebastian : 1.287209 secs (1.000000x)
      Brionius : 1.420099 secs (1.103239x)
      arshajii : 1.431521 secs (1.112112x)
     martineau : 2.031539 secs (1.578251x)
           jay : 7.817713 secs (6.073384x)

As you can see, J.F. Sebastian's suggestion is the fastest, even with the modification I put in to make it the same as the others.

martineau
  • 119,623
  • 25
  • 170
  • 301
1

Here's another way:

def getVal(d, keys, default):
    keys = keys.split(".")  # You can avoid this first step if you're willing to use a list like ["1", "2", "3"...] as an input instead of a string like "1.2.3..."
    for key in keys:
        try:
            d = d[key]
        except KeyError:
            return default
    return d

I can profile it if you want - let me know. Keep in mind there's no point in optimizing unless you've encountered or have reason to believe you will encounter a bottleneck.

Brionius
  • 13,858
  • 3
  • 38
  • 49
  • This would also need the type check: `if not isinstance(d, dict): return default` (because d[key] on a non-dict will not necessarily raise KeyError). – nmclean Aug 20 '13 at 17:21
  • Could be, if the keys and values of the dict tree aren't always strings. I'll leave that to the OP to add in, if their sample input is not representative. – Brionius Aug 20 '13 at 18:16
  • +1 For performance based on benchmark results and a certain amount of elegance. – martineau Aug 22 '13 at 09:54