Most pythonic and fastest way to create a list of key value pairs from a set of nested dictionaries?

Question

I have come up with the following solution, but it was quite ugly (see original solution). I'm fairly happy with the revised solution. Anybody have a cleaner / faster way to accomplish the same output?

Other requirements:

Must accept any value and return a list of key value pairs.
The final key must track the list of keys to access the value with dot syntax.
must return a list of key value pairs or a dictionary.
must remove leading . when no base_key is supplied.

My revised solution:

def create_nested_kvl(v, base_key=None):
    kvl = []
    if not isinstance(v, dict):
        kvl.append((base_key,v))
    else:
        def iterate(v, k):
            for ki, vi in v.items():
                ki = '%s.%s' % (k, ki) if k else ki
                iterate(vi, ki) if isinstance(vi, dict) else kvl.append((ki, vi))
        iterate(v, base_key)
    return kvl

My Original Solution:

def create_nested_kvl(v, base_key=''):
    """ Creates a list of dot syntax key value pairs from a nested dictionary.
    :param      v: The value suspected to be a nested dictionary.
    :param      k: Base key
    :return:    [(k,v)]
    :rtype:     list
    """
    if not isinstance(v, dict):
        return [(base_key,v)]

    kvl = []
    def iterate(v, k):
        for kd, vd in v.items():
            v = vd
            kd = '%s.%s' % (k, kd) if k else kd
            kvl.append((kd, v))

    iterate(v, base_key)
    for k, v in kvl:
        if isinstance(v, dict):
            iterate(v, k)
            kvl.remove((k,v))
    return kvl

input:

v = {'type1':'type1_val',
     'type2':'type2_val',
     'object': {
          'k1': 'val1',
          'k2': 'val2',
          'k3': {'k31': {
                     'k311': 'val311',
                     'k322': 'val322',
                     'k333': 'val333'
                     },
                'k32': 'val32',
                'k33': 'val33'}}}

create_nested_kvl(v, 'base')

output:

[('base.type1', 'type1_val'),
 ('base.type2', 'type2_val'),
 ('base.object.k2', 'val2'),
 ('base.object.k1', 'val1'),
 ('base.object.k3.k33', 'val33'),
 ('base.object.k3.k32', 'val32'),
 ('base.object.k3.k31.k311', 'val311'),
 ('base.object.k3.k31.k333', 'val333'),
 ('base.object.k3.k31.k322', 'val322')]

Notes:

The generator solution presented by Alex Martelli is very slick. Unfortunately, it appears to be a tad slower than my first and revised solution. Also, it returns a generator which still needs to be converted to a list or poof, its gone.

timeit results @ number=1000000:

generator : 0.911420848311 (see alex's answer)
original  : 0.720069713321
revised   : 0.660259814902

best      : 0.660259814902 
* as Alex pointed out, my late night rounding skills are horrific.
It's 27% faster not twice as fast (my bad).

I think i must have answered this at least twice. Let me find a link. http://stackoverflow.com/a/28131507/3467349 — user3467349, Feb 04 '15 at 02:49
@dawg -- Except for the part where OP modifies a list while iterating over it :-)... probably want `for k, v in kvl[:]:` or something similar. — mgilson, Feb 04 '15 at 02:59
sigh =/ There should have been a canonical question for this. — user3467349, Feb 04 '15 at 03:02
How is 0.66 "almost twice as fast" as 0.91? `((0.91-0.66)/0.91)` is `0.27` -- 27% isn't "almost twice as fast". — Alex Martelli, Feb 04 '15 at 15:29
Alex, you are correct. My rounding skills clearly need some work! — Arctelix, Feb 04 '15 at 20:52

Alex Martelli · Answer 1 · 2015-02-04T15:26:53.313

Apart from ordering of keys in dicts being arbitrary, and the possible need to trim leading .s if that's needed for empty keys (spec unclear):

def create_nested_kvl(v, k=''):
    if isinstance(v, dict):
        for tk in v:
            for sk, sv in create_nested_kvl(v[tk], tk):
                yield '{}.{}'.format(k, sk), sv
    else:
        yield k, v

seems nice and compact. E.g:

v = {'type1':'type1_val',
     'type2':'type2_val',
     'object': {
          'k1': 'val1',
          'k2': 'val2',
          'k3': {'k31': {
                     'k311': 'val311',
                     'k322': 'val322',
                     'k333': 'val333'
                     },
                'k32': 'val32',
                'k33': 'val33'}}}

import pprint
pprint.pprint(list(create_nested_kvl(v, 'base')))

emits

[('base.object.k3.k31.k311', 'val311'),
 ('base.object.k3.k31.k333', 'val333'),
 ('base.object.k3.k31.k322', 'val322'),
 ('base.object.k3.k33', 'val33'),
 ('base.object.k3.k32', 'val32'),
 ('base.object.k2', 'val2'),
 ('base.object.k1', 'val1'),
 ('base.type1', 'type1_val'),
 ('base.type2', 'type2_val')]

as required.

Added: in Python, "fast" and "elegant" often coincide -- but not always so. In particular, recursion is slightly slower and so are lookups of globals in loop. So, here, pulling all the usual tricks for recursion elimination w/an explicit stack, and lookup hoisting, one can get...:

def faster(v, k='', isinstance=isinstance):
    stack = [(k, v)]
    result = []
    push, pop = stack.append, stack.pop
    resadd = result.append
    fmt = '{}.{}'.format
    while stack:
        k, v = pop()
        if isinstance(v, dict):
            for tk, vtk in v.iteritems():
                push((fmt(k, tk), vtk))
        else:
            resadd((k, v))
    return result

...definitely not as elegant, but... on my laptop, my original version, plus a list() at the end, takes 21.5 microseconds on the given sample v; this faster version takes 16.8 microseconds. If saving those 4.7 microseconds (or, expressed more meaningfully, 22% of the original runtime) is more important than clarity and maintainability, then one can pick the second version and get the same results (net as usual of ordering) that much faster.

The OP's "revised version" is still faster on the sample v, partly because formatting with % is slightly faster in Python 2 than the more elegant format, and partly because items is slightly faster (again, Python 2 only) than iteritems; and some hoisting might further shave some nanoseconds off that one, too.

This is really awesome! However, it's a generator and a bit slower than my solution when I test it with timeit. Then i still need to make it a dict or list since i will be iterating this a few times down the road. — Arctelix, Feb 04 '15 at 08:01
Added a faster version (recursion elimination, hoisting, fallback to appending from `yield` which was needed only for the recursion itself) but your revised version is still faster. However you might want to edit your Q's subject by asking for "fastest" rather than "most Pythonic" (and measure on huge dicts where the difference is meaningful rather than microseconds -- I'm not sure the two approaches scale up identically as the dicts' size grow to where speed is meaningful). — Alex Martelli, Feb 04 '15 at 15:33
Thanks for your help on this, your solutions are quite genius! I am a fairly new developer and am solidifying my understand the term "pythonic" (seems to be some controversy). Any how, my revised solution is short and, at lest to me, very readable. Plus fast, which is important due to the frequency that it will be used in my code. With that being said, would you favor your first or second solution to my revised? — Arctelix, Feb 04 '15 at 21:04
I'd go with my first solution (with minor tweaks) for readability and style. Before compromising either to gain 4-5 microseconds (on a Macbook Air, hardly the fastest machine available -- would be much less on a powerful server) I'd want proof that the tiny gain is crucial; if so I'd start with the recursion-elimination refactor (doesn't really impact readability much) and only in dire performance emergencies would I push to such extremes as hoisting, items vs iteritems, % vs format, abuse of if/else expressions as statements, and other desperation measures:-). — Alex Martelli, Feb 04 '15 at 22:00

Most pythonic and fastest way to create a list of key value pairs from a set of nested dictionaries?

1 Answers1