6

I've been using imran's great answer to flatten nested Python dictionaries, compressing keys and am trying to think of a way to further flatten the dictionaries that may be inside list values of the dictionary items.
(Of course, as my data is usually coming from XML this can also be recursive...)

from pprint import pprint
from collections import MutableMapping

def flatten(d, parent_key='', sep='_'):
    items = []
    for k, v in d.items():
        new_key = parent_key + sep + k if parent_key else k
        if isinstance(v, MutableMapping):
            items.extend(flatten(v, new_key, sep=sep).items())
        else:
            items.append((new_key, v))
    return dict(items)

Given a dict d like this:

d = {"a": 1,
     "b": 2,
     "c": {"sub-a": "one",
           "sub-b": "two",
           "sub-c": "thre"}}

this works great:

pprint(flatten(d))

        {'a': 1,
         'b': 2,
         'c_sub-a': 'one',
         'c_sub-b': 'two',
         'c_sub-c': 'thre'}

However, I would like to further recur through a list value of the dict items, and inspect if that each dict in the list can be further flattened.

Here's an example of sample input with c-list as a nested list value:

d = {"a": 1,
     "b": 2,
     "c-list": [
         {"id": 1, "nested": {"sub-a": "one", "sub-b": "two", "sub-c": "thre"} },
         {"id": 2, "nested": {"sub-a": "one", "sub-b": "two", "sub-c": "thre"} },
         {"id": 3, "nested": {"sub-a": "one", "sub-b": "two", "sub-c": "thre"} }]}

Here's what I currently get with the function above:

pprint(flatten(d))

{'a': 1,
 'b': 2,
 'c-list': [{'id': 1, 'nested': {'sub-a': 'one', 'sub-b': 'two', 'sub-c': 'thre'}},
            {'id': 2, 'nested': {'sub-a': 'one', 'sub-b': 'two', 'sub-c': 'thre'}},
            {'id': 3, 'nested': {'sub-a': 'one', 'sub-b': 'two', 'sub-c': 'thre'}}]}

Below is the output I'm looking for, retaining all the functionality of the original flatten():

{'a': 1,
 'b': 2,
 'c-list': [{'id': 1, 'nested_sub-a': 'one', 'nested_sub-b': 'two', 'nested_sub-c': 'thre'},
            {'id': 2, 'nested_sub-a': 'one', 'nested_sub-b': 'two', 'nested_sub-c': 'thre'},
            {'id': 3, 'nested_sub-a': 'one', 'nested_sub-b': 'two', 'nested_sub-c': 'thre'}]}

I'm struggling to figure out how to recursively "re-assemble" the dict into this when it contains lists... any tips appreciated.

mx0
  • 6,445
  • 12
  • 49
  • 54
joefromct
  • 1,506
  • 13
  • 33

2 Answers2

3

You were really close, if a value is a list, then a single line is needed to gets you to a recursive version of flatten:

items.append((new_key, map(flatten, v)))  # for python 2.x
# or
items.append((new_key, list(map(flatten, v))))  # for python 3.x

So, you simply recursively call the function on each element.

Here is how flatten then would look like:

def flatten(d, parent_key='', sep='_'):
    items = []
    for k, v in d.items():
        new_key = '{0}{1}{2}'.format(parent_key,sep,k) if parent_key else k
        if isinstance(v, MutableMapping):
            items.extend(flatten(v, new_key, sep=sep).items())
        elif isinstance(v, list):
            # apply itself to each element of the list - that's it!
            items.append((new_key, map(flatten, v)))
        else:
            items.append((new_key, v))
    return dict(items)

This solution can cope with an arbitrary depth of lists in lists.

j-i-l
  • 10,281
  • 3
  • 53
  • 70
  • 2
    Why are you using `lambda x: flatten(x)`, not just `flatten`? – Solomon Ucko Dec 01 '17 at 22:43
  • Interestingly, with my sample the output looks like this, being a map object rather than a dict: `{'a': 1, 'b': 2, 'c-list': }` – joefromct Dec 01 '17 at 22:45
  • @ah you are using python 3, right?! I used python 2 where `map` returns a `list`. Simply change to `list(map(...` and you're good! – j-i-l Dec 01 '17 at 22:51
  • 1
    Yes, python 3.6. I wrapped your `lambda` in a `list()` and i got the output i need. :) Like so `list(map(lambda x: flatten(x), v))`. – joefromct Dec 01 '17 at 22:52
  • @joefromct: please note that `list(map(flatten, v))` is completely fine and probably the [fastest solution](https://stackoverflow.com/questions/1247486/python-list-comprehension-vs-map#1247490). So there is actually no need for `lambda` in this case. – j-i-l Dec 01 '17 at 23:09
  • 1
    @SolomonUcko you are completely right! The `lambda` made it in there because of me *ctrl-c ctrl-v* this part form one of my scripts. :P Damn you copy-paste! – j-i-l Dec 01 '17 at 23:11
  • Lol. It seems a lot of people accidentally use lambda unnecessarily, by accident. – Solomon Ucko Dec 02 '17 at 01:13
  • 1
    FYI, if one wanted to use a different `sep` than the default, i believe we would need the lambda. for instance, passing in `sep='.'` would require the lambda, probably something like this: `map(lambda x: flatten(x, parent='', sep=sep), v)`. (the `sep=sep` would reference the `sep` originally passed in...) – joefromct Dec 02 '17 at 03:05
  • The recursive call will break if the list contains something other than a `MutableMapping` (e.g. `[1,2,3]`). You can fix that by checking whether `isinstance(d, MutableMapping)` before calling `items()`. If `d` is not a mapping, it's already "flat". – Rafael Barbosa Aug 17 '18 at 10:04
1

Simply, do a flatten for each of the dictionaries inside the list, collect them into a new list and append it to items with the original key

def flatten(d, parent_key='', sep='_'):
        items = []
        for k, v in d.items():
            new_key = parent_key + sep + k if parent_key else k
            if isinstance(v, MutableMapping):
                items.extend(flatten(v, new_key, sep=sep).items())
            elif type(v) == list:
                items.append((new_key, [flatten(i) for i in v]))
            else:
                items.append((new_key, v))
        return dict(items)
Aakash Verma
  • 3,705
  • 5
  • 29
  • 66
  • 1
    but when you call `flatten` within `flatten` the you are also doing *recursion*! – dendragon Dec 01 '17 at 22:58
  • @dendragon That is not what I'm talking about. That will anyway happen if there are dictionaries under dictionaries in the original code by imran. – Aakash Verma Dec 01 '17 at 23:00
  • but what you write: _iteration is faster than recursion_, does then not make much sense – dendragon Dec 01 '17 at 23:03
  • Yeah, I see. I thought I am creating a new and different stack of recursion which would not hit the main stack. – Aakash Verma Dec 01 '17 at 23:08
  • List comprehension would indeed be faster than `map(lambda(..` calls. For this particular case `map` alone does the trick and there is [if any, a little speed advantage of `map`](https://stackoverflow.com/questions/1247486/python-list-comprehension-vs-map#1247490). – j-i-l Dec 01 '17 at 23:15
  • The simplicity and compatibility is much better here, though – Aakash Verma Dec 02 '17 at 03:48
  • Why is list comprehension more compatible than `map`? IMHO `map` is also easier to read. – dendragon Dec 02 '17 at 12:05