299

Suppose you have a dictionary like:

{'a': 1,
 'c': {'a': 2,
       'b': {'x': 5,
             'y' : 10}},
 'd': [1, 2, 3]}

How would you go about flattening that into something like:

{'a': 1,
 'c_a': 2,
 'c_b_x': 5,
 'c_b_y': 10,
 'd': [1, 2, 3]}
codeforester
  • 39,467
  • 16
  • 112
  • 140
A Timmes
  • 3,093
  • 3
  • 16
  • 5
  • 7
    also, there is a library for it: https://github.com/ianlini/flatten-dict – Ufos Aug 31 '18 at 16:34
  • 1
    **see also:** https://stackoverflow.com/questions/14692690 – dreftymac Jan 18 '19 at 03:39
  • I see very different performance for the approaches suggested in the answers. – Fontanka16 Jun 17 '21 at 15:30
  • The question should have at the end: "so that all levels' keys on the path to the leaf are concatenated?" Or change the header to "compressing (= concatenating) keys". There should be "concatenat" in the question for those who search. I was searching for a solution that would give a **list** of the keys on the path to the leaf, not a concatenation. You could say use `split()` then, but there are other direct ways that this question does not encourage. – questionto42 Sep 14 '21 at 20:48

32 Answers32

326

Basically the same way you would flatten a nested list, you just have to do the extra work for iterating the dict by key/value, creating new keys for your new dictionary and creating the dictionary at final step.

from collections.abc import MutableMapping

def flatten(dictionary, parent_key='', separator='_'):
    items = []
    for key, value in dictionary.items():
        new_key = parent_key + separator + key if parent_key else key
        if isinstance(value, MutableMapping):
            items.extend(flatten(value, new_key, separator=separator).items())
        else:
            items.append((new_key, value))
    return dict(items)

>>> flatten({'a': 1, 'c': {'a': 2, 'b': {'x': 5, 'y' : 10}}, 'd': [1, 2, 3]})
{'a': 1, 'c_a': 2, 'c_b_x': 5, 'd': [1, 2, 3], 'c_b_y': 10}
Ophir Carmi
  • 2,701
  • 1
  • 23
  • 42
Imran
  • 87,203
  • 23
  • 98
  • 131
  • 8
    If you replace the `isinstance` with a `try..except` block, this will work for any mapping, even if it is not derived from `dict`. – Björn Pollex May 17 '11 at 07:34
  • 2
    Changed it to test for `collections.MutableMapping` to make it more generic. But for Python < 2.6, `try..except` is probably the best option. – Imran May 17 '11 at 07:55
  • 8
    If you want empty dictionaries preserved in flattened version you might want to change `if isinstance(v, collections.MutableMapping):` to `if v and isinstance(v, collections.MutableMapping):` – tarequeh Sep 06 '13 at 00:19
  • exactly what i wanted... thanks!!! remember to import collections when using this. – ihightower May 10 '15 at 11:53
  • 4
    Note that `new_key = parent_key + sep + k if parent_key else k` assumes that keys are always strings, otherwise it will raise `TypeError: cannot concatenate 'str' and [other] objects`. However, you could fix that by simply coercing `k` to string (`str(k)`), or concatenating keys into a tuple instead of a string (tuples can be dict keys, too). – Scott H Jun 29 '15 at 21:09
  • 1
    And the inflate function is [here](https://gist.github.com/fmder/494aaa2dd6f8c428cede) – mitch Jan 26 '16 at 20:54
  • That is cool. I was looking for a way to have a program reference a dict, like a xpath, this worked great! – Greg Feb 29 '16 at 23:30
  • You can replace `items = []; ... return dict(items)` with `result = {}; ... return result`. This will work slightly faster. – Mark Mishyn Oct 15 '18 at 09:22
  • 1
    What if I want to flatten an array and objects inside it? This is not working for arrays. For eg. `{'a': 1, 'c': {'a': 2, 'b': {'x': 5, 'y' : 10}}, 'd': [1, 2, 3]}` should give me `{'a': 1, 'c_a': 2, 'c_b_x': 5, 'd_0': 1, 'd_1': 2,'d_2': 3, 'c_b_y': 10}` – roneo May 28 '19 at 11:34
  • 2
    Answered my own query: I added one "elif" and that did the trick... `elif isinstance(v,list): for idx,val in enumerate(v): new_key = str(parent_key) + sep + str(k) + sep + str(idx) if parent_key else str(k) + sep + str(idx) items.extend(Controller.flatten(v[idx],new_key,sep=sep).items())` – roneo May 28 '19 at 12:01
  • If using Python 3.5+, the dictionary unpacking seems to be [the fastest](https://gist.github.com/treyhunner/f35292e676efa0be1728), so do `items = {**items, **flatten(v, new_key, sep=sep)}` – Abhijit Sarkar Aug 27 '20 at 05:49
  • im new using collections, but what does collections.MutableMapping do? Ive looked online and cant find a good explanation – clumbzy1 Oct 15 '20 at 06:51
  • I created a dictionary like this similar to a case i had but this failed to run successfully on it . my_dict=xmltodict.parse("""X2X3X4X5X6X7X8X10X9X11X12""") The answered code doesnot work for this – Invictus Nov 06 '20 at 09:17
  • Actually these do not work for xml with repeating tags or sub-tree it seems. – Invictus Nov 06 '20 at 13:04
  • Is it possible to do this in C++? Which data structure would hold the input? – Maddy Nov 24 '20 at 05:55
  • @Maddy using boost property tree might help, check that. – Invictus Dec 02 '20 at 11:52
  • 1
    If you have integers as key, you need to change `k` to `str(k)`, else you get `in flatten(d, parent_key, sep) 4 items = [] 5 for k, v in d.items(): ----> 6 new_key = parent_key + sep + k if parent_key else k 7 if isinstance(v, collections.MutableMapping): 8 items.extend(flatten(v, new_key, sep=sep).items()) TypeError: can only concatenate str (not "int") to str`. – questionto42 Sep 12 '21 at 15:13
  • Wouldn't it be more correct to test for instances of `Mapping` instead of `MutableMapping` even though the standard library still lacks an frozenmap type, you may still come across user types that devive from `Mapping`. – Alex Jasmin Sep 13 '21 at 18:38
  • @roneo What happens in the ```if parent_key```statement? – RSale Jan 19 '22 at 09:40
  • 1
    @RSale Default boolean comparison (e.g. bool("") -> False, etc.) It's a somewhat "hacky" way to simulate `?:˙ C operator – Ate Somebits Jan 21 '22 at 23:26
  • 1
    If anyone wants to get list of only the keys. Here's my version inspired from this. https://gist.github.com/AmitChotaliya/3a1f1181c5d2ef7260432a8112e440de – bitkot Mar 11 '22 at 16:24
205

Or if you are already using pandas, You can do it with json_normalize() like so:

import pandas as pd

d = {'a': 1,
     'c': {'a': 2, 'b': {'x': 5, 'y' : 10}},
     'd': [1, 2, 3]}

df = pd.json_normalize(d, sep='_')

print(df.to_dict(orient='records')[0])

Output:

{'a': 1, 'c_a': 2, 'c_b_x': 5, 'c_b_y': 10, 'd': [1, 2, 3]}
Mohammad Yusuf
  • 16,554
  • 10
  • 50
  • 78
  • 7
    or just pass the sep argument :) – Blue Moon Sep 25 '18 at 08:13
  • 7
    Bit of a shame it doesn't handle lists :) – Roelant Nov 27 '18 at 14:32
  • I think the most recent version is `df = pd.io.json.json_normalize(original, sep='_')` – Domi W Oct 01 '20 at 13:51
  • 8
    This is deprecated, the most recent is: `df = pd.json_normalize(d, sep='_')` – dreamflasher Oct 26 '20 at 17:44
  • In my application, the pandas approach really dragged down performance compared to @imran's answer – Fontanka16 Jun 17 '21 at 15:31
  • This does not work if you have integers as keys in the dictionary: `98 if level != 0: # so we skip copying for top level, common case ---> 99 v = new_d.pop(k) 100 new_d[newkey] = v 101 continue KeyError: '6'` – questionto42 Sep 12 '21 at 15:07
  • @MohammadYusuf As to your link, that is the condition on the JSON side. But for a Python dict, this is no longer true, only older versions still ask for string keys. If you want to use this JSON trick here, you need to stick to the JSON conditions, of course. – questionto42 Sep 13 '21 at 08:37
  • @questionto42 I see. Been long, you are right. Will check. – Mohammad Yusuf Sep 13 '21 at 09:41
  • 2
    @MohammadYusuf I could not convert keys to string using just a parameter in the `json_normalize` function. It is built-in on the JSON side. Perhaps, they will change it in future. It still is a compact one-liner and good for the standard case of string keys. – questionto42 Sep 13 '21 at 10:03
  • TIL - thanks! And I thought I knew pandas well – jason m Feb 09 '22 at 19:40
  • In Pandas v1.4.1, you'll need to chain a `.to_dict()` call onto the result otherwise the return type is a dataframe – Addison Klinke Apr 11 '22 at 16:11
  • 1
    Seeing some strange behavior with `pd.json_normalize({'hello': 'world'}, sep='__').to_dict()` returning `{'hello': {0: 'world'}}` which I reported as a bug on the [Pandas Github](https://github.com/pandas-dev/pandas/issues/46743) – Addison Klinke Apr 11 '22 at 16:25
81

There are two big considerations that the original poster needs to consider:

  1. Are there keyspace clobbering issues? For example, {'a_b':{'c':1}, 'a':{'b_c':2}} would result in {'a_b_c':???}. The below solution evades the problem by returning an iterable of pairs.
  2. If performance is an issue, does the key-reducer function (which I hereby refer to as 'join') require access to the entire key-path, or can it just do O(1) work at every node in the tree? If you want to be able to say joinedKey = '_'.join(*keys), that will cost you O(N^2) running time. However if you're willing to say nextKey = previousKey+'_'+thisKey, that gets you O(N) time. The solution below lets you do both (since you could merely concatenate all the keys, then postprocess them).

(Performance is not likely an issue, but I'll elaborate on the second point in case anyone else cares: In implementing this, there are numerous dangerous choices. If you do this recursively and yield and re-yield, or anything equivalent which touches nodes more than once (which is quite easy to accidentally do), you are doing potentially O(N^2) work rather than O(N). This is because maybe you are calculating a key a then a_1 then a_1_i..., and then calculating a then a_1 then a_1_ii..., but really you shouldn't have to calculate a_1 again. Even if you aren't recalculating it, re-yielding it (a 'level-by-level' approach) is just as bad. A good example is to think about the performance on {1:{1:{1:{1:...(N times)...{1:SOME_LARGE_DICTIONARY_OF_SIZE_N}...}}}})

Below is a function I wrote flattenDict(d, join=..., lift=...) which can be adapted to many purposes and can do what you want. Sadly it is fairly hard to make a lazy version of this function without incurring the above performance penalties (many python builtins like chain.from_iterable aren't actually efficient, which I only realized after extensive testing of three different versions of this code before settling on this one).

from collections import Mapping
from itertools import chain
from operator import add

_FLAG_FIRST = object()

def flattenDict(d, join=add, lift=lambda x:(x,)):
    results = []
    def visit(subdict, results, partialKey):
        for k,v in subdict.items():
            newKey = lift(k) if partialKey==_FLAG_FIRST else join(partialKey,lift(k))
            if isinstance(v,Mapping):
                visit(v, results, newKey)
            else:
                results.append((newKey,v))
    visit(d, results, _FLAG_FIRST)
    return results

To better understand what's going on, below is a diagram for those unfamiliar with reduce(left), otherwise known as "fold left". Sometimes it is drawn with an initial value in place of k0 (not part of the list, passed into the function). Here, J is our join function. We preprocess each kn with lift(k).

               [k0,k1,...,kN].foldleft(J)
                           /    \
                         ...    kN
                         /
       J(k0,J(k1,J(k2,k3)))
                       /  \
                      /    \
           J(J(k0,k1),k2)   k3
                    /   \
                   /     \
             J(k0,k1)    k2
                 /  \
                /    \
               k0     k1

This is in fact the same as functools.reduce, but where our function does this to all key-paths of the tree.

>>> reduce(lambda a,b:(a,b), range(5))
((((0, 1), 2), 3), 4)

Demonstration (which I'd otherwise put in docstring):

>>> testData = {
        'a':1,
        'b':2,
        'c':{
            'aa':11,
            'bb':22,
            'cc':{
                'aaa':111
            }
        }
    }
from pprint import pprint as pp

>>> pp(dict( flattenDict(testData) ))
{('a',): 1,
 ('b',): 2,
 ('c', 'aa'): 11,
 ('c', 'bb'): 22,
 ('c', 'cc', 'aaa'): 111}

>>> pp(dict( flattenDict(testData, join=lambda a,b:a+'_'+b, lift=lambda x:x) ))
{'a': 1, 'b': 2, 'c_aa': 11, 'c_bb': 22, 'c_cc_aaa': 111}    

>>> pp(dict( (v,k) for k,v in flattenDict(testData, lift=hash, join=lambda a,b:hash((a,b))) ))
{1: 12416037344,
 2: 12544037731,
 11: 5470935132935744593,
 22: 4885734186131977315,
 111: 3461911260025554326}

Performance:

from functools import reduce
def makeEvilDict(n):
    return reduce(lambda acc,x:{x:acc}, [{i:0 for i in range(n)}]+range(n))

import timeit
def time(runnable):
    t0 = timeit.default_timer()
    _ = runnable()
    t1 = timeit.default_timer()
    print('took {:.2f} seconds'.format(t1-t0))

>>> pp(makeEvilDict(8))
{7: {6: {5: {4: {3: {2: {1: {0: {0: 0,
                                 1: 0,
                                 2: 0,
                                 3: 0,
                                 4: 0,
                                 5: 0,
                                 6: 0,
                                 7: 0}}}}}}}}}

import sys
sys.setrecursionlimit(1000000)

forget = lambda a,b:''

>>> time(lambda: dict(flattenDict(makeEvilDict(10000), join=forget)) )
took 0.10 seconds
>>> time(lambda: dict(flattenDict(makeEvilDict(100000), join=forget)) )
[1]    12569 segmentation fault  python

... sigh, don't think that one is my fault...


[unimportant historical note due to moderation issues]

Regarding the alleged duplicate of Flatten a dictionary of dictionaries (2 levels deep) of lists

That question's solution can be implemented in terms of this one by doing sorted( sum(flatten(...),[]) ). The reverse is not possible: while it is true that the values of flatten(...) can be recovered from the alleged duplicate by mapping a higher-order accumulator, one cannot recover the keys. (edit: Also it turns out that the alleged duplicate owner's question is completely different, in that it only deals with dictionaries exactly 2-level deep, though one of the answers on that page gives a general solution.)

ninjagecko
  • 88,546
  • 24
  • 137
  • 145
  • 3
    I am not sure if this is relevant to the question. This solution does not flatten a dictionary item of a list of dictionaries, i.e. {'a': [{'aa': 1}, {'ab': 2}]}. The flattenDict function can be altered easily to accommodate this case. – Stewbaca Mar 02 '16 at 19:25
  • Use `join(partialKey + '_',lift(k)` if you need the underscores as in the question. – questionto42 Sep 12 '21 at 14:52
  • If you have integers as keys in the dictionary, you need to change `lift(k)` to `str(lift(k))` to avoid `in visit(subdict, results, partialKey) 9 def visit(subdict, results, partialKey): 10 for k,v in subdict.items(): ---> 11 newKey = lift(k) if partialKey==_FLAG_FIRST else join(partialKey + ',',lift(k)) 12 if isinstance(v,Mapping): 13 visit(v, results, newKey) TypeError: can only concatenate str (not "int") to str`. – questionto42 Sep 12 '21 at 15:08
  • 1
    @questionto42: There is no change necessary; that is the purpose of the `lift` parameter. You can just set `flattenDict(..., join=lambda a,b:a+'_'+b, lift=repr)` (or `str` but that's not a good idea due to key collsions 1<->'1') rather than leaving lift as the identity function and modifying the general-purpose code. – ninjagecko Sep 12 '21 at 20:05
  • Now that looks like higher science :) I see your point with the collisions, though. – questionto42 Sep 12 '21 at 20:19
51

If you're using pandas there is a function hidden in pandas.io.json._normalize1 called nested_to_record which does this exactly.

from pandas.io.json._normalize import nested_to_record    

flat = nested_to_record(my_dict, sep='_')

1 In pandas versions 0.24.x and older use pandas.io.json.normalize (without the _)

Aaron N. Brock
  • 4,276
  • 2
  • 25
  • 43
  • 2
    What worked for me was `from pandas.io.json._normalize import nested_to_record`. Notice the underscore (`_`) before `normalize`. – Eyal Levin Oct 06 '19 at 12:20
  • 3
    @EyalLevin Good catch! This changed in `0.25.x`, I've updated the answer. :) – Aaron N. Brock Oct 07 '19 at 13:57
  • 1
    This does not work if you have integers as keys in the dictionary. `--> 103 v = new_d.pop(k) 104 new_d.update(nested_to_record(v, newkey, sep, level + 1, max_level)) 105 new_ds.append(new_d) KeyError: '6'` – questionto42 Sep 12 '21 at 14:59
38

Not exactly what the OP asked, but lots of folks are coming here looking for ways to flatten real-world nested JSON data which can have nested key-value json objects and arrays and json objects inside the arrays and so on. JSON doesn't include tuples, so we don't have to fret over those.

I found an implementation of the list-inclusion comment by @roneo to the answer posted by @Imran :

https://github.com/ScriptSmith/socialreaper/blob/master/socialreaper/tools.py#L8

import collections
def flatten(dictionary, parent_key=False, separator='.'):
    """
    Turn a nested dictionary into a flattened dictionary
    :param dictionary: The dictionary to flatten
    :param parent_key: The string to prepend to dictionary's keys
    :param separator: The string used to separate flattened keys
    :return: A flattened dictionary
    """

    items = []
    for key, value in dictionary.items():
        new_key = str(parent_key) + separator + key if parent_key else key
        if isinstance(value, collections.abc.MutableMapping):
            items.extend(flatten(value, new_key, separator).items())
        elif isinstance(value, list):
            for k, v in enumerate(value):
                items.extend(flatten({str(k): v}, new_key).items())
        else:
            items.append((new_key, value))
    return dict(items)

Test it:

flatten({'a': 1, 'c': {'a': 2, 'b': {'x': 5, 'y' : 10}}, 'd': [1, 2, 3] })

>> {'a': 1, 'c.a': 2, 'c.b.x': 5, 'c.b.y': 10, 'd.0': 1, 'd.1': 2, 'd.2': 3}

Annd that does the job I need done: I throw any complicated json at this and it flattens it out for me.

All credits to https://github.com/ScriptSmith .

2023-06-14 Update for python >= 3.10

Since Python 3.10, collections.MutableMapping has changed to collections.abc.MutableMapping. Hence code above is edited to reflect the same. If your python version is before 3.10, please change it back to collections.MutableMapping at your side.
Ref: https://stackoverflow.com/a/71902541/4355695

Nikhil VJ
  • 5,630
  • 7
  • 34
  • 55
  • 4
    This is my favorite answer so far since it handles nested lists of dicts. – shellcat_zero Apr 30 '21 at 20:11
  • 1
    Thanks. I think this is best one, as it works with lists as well. – Marcin Nov 23 '21 at 05:25
  • In case: any comes across an error: `AttributeError: module 'collections' has no attribute 'MutableMapping'` Use, `from collections.abc import MutableMapping`. `collections.MutableMapping` has been deprecated. – anyfactor Oct 16 '22 at 00:34
  • what about creating multiple rows of data – geekidharsh May 19 '23 at 20:34
  • @geekidharsh didn't get you.. if you're referring to a list of nested dicts you want to flatten, then: assuming `data` holds the list.. `data_flat = [flatten(row) for row in data]` – Nikhil VJ Jun 14 '23 at 17:52
37

Here is a kind of a "functional", "one-liner" implementation. It is recursive, and based on a conditional expression and a dict comprehension.

def flatten_dict(dd, separator='_', prefix=''):
    return { prefix + separator + k if prefix else k : v
             for kk, vv in dd.items()
             for k, v in flatten_dict(vv, separator, kk).items()
             } if isinstance(dd, dict) else { prefix : dd }

Test:

In [2]: flatten_dict({'abc':123, 'hgf':{'gh':432, 'yu':433}, 'gfd':902, 'xzxzxz':{"432":{'0b0b0b':231}, "43234":1321}}, '.')
Out[2]: 
{'abc': 123,
 'gfd': 902,
 'hgf.gh': 432,
 'hgf.yu': 433,
 'xzxzxz.432.0b0b0b': 231,
 'xzxzxz.43234': 1321}
dividebyzero
  • 2,190
  • 1
  • 21
  • 33
  • 1
    This doesn't work for general dictionaries, specifically, with tuple keys, eg substitute `('hgf',2)` for the 2nd key in your test throws `TypeError` – alancalvitti Jul 03 '19 at 19:22
  • 1
    @alancalvitti This assumes it to be a string, or something else that supports the `+` operator. For anything else you'll need to adapt `prefix + separator + k` to the appropriate function call to compose the objects. – dividebyzero Jul 05 '19 at 13:31
  • 1
    Another issue relevant to tuple keys. I've posted separately how to generalize based on your method. However it cannot correctly handle ninjageko's example: `{'a_b':{'c':1}, 'a':{'b_c':2}}` – alancalvitti Jul 05 '19 at 15:39
  • 7
    I was getting worried, seeing no answers utilizing recursion. What's wrong with our youth these days? – Jakov Nov 15 '19 at 09:47
  • 1
    does nothing if a dict has nested list of dicts, like this: `{'name': 'Steven', 'children': [{'name': 'Jessica', 'children': []}, {'name': 'George', 'children': []}]}` – Gergely M May 25 '20 at 10:34
  • @GergelyM during traversal we fundamentally assume that a value is either a dict, and therefore a node to be visited iterating over the keys, or a leaf otherwise. A list is just taken to be a non-traversable value. You probably need to adapt the algorithm to your application here, depending on what you need. What should even be the output here, and how does it relate to the OP question? – dividebyzero May 31 '20 at 09:40
  • Sure, not trying to question a good solution and you're absolutely correct - if we're taking granted the input dictionary contains only dictionaries. Though, it's more life-like and completely valid for a python dictionary to store other data-structures, a list of dictionaries for example. – Gergely M Jun 01 '20 at 16:49
  • If you have an integer in the keys, you need `str(k)` and `str(kk)`. – questionto42 Sep 12 '21 at 20:12
  • @Jakov: The top answers are recursive, and if you take the "json" trick as inherently recursive, all of the higher voted answers are recursive. – questionto42 Sep 12 '21 at 20:13
16

Code:

test = {'a': 1, 'c': {'a': 2, 'b': {'x': 5, 'y' : 10}}, 'd': [1, 2, 3]}

def parse_dict(init, lkey=''):
    ret = {}
    for rkey,val in init.items():
        key = lkey+rkey
        if isinstance(val, dict):
            ret.update(parse_dict(val, key+'_'))
        else:
            ret[key] = val
    return ret

print(parse_dict(test,''))

Results:

$ python test.py
{'a': 1, 'c_a': 2, 'c_b_x': 5, 'd': [1, 2, 3], 'c_b_y': 10}

I am using python3.2, update for your version of python.

Pavan Yalamanchili
  • 12,021
  • 2
  • 35
  • 55
  • You probably want to specify the default value of `lkey=''` in your function definition instead of when calling the function. See other answers in this regard. – Asclepius Dec 21 '12 at 10:55
8

This is not restricted to dictionaries, but every mapping type that implements .items(). Further ist faster as it avoides an if condition. Nevertheless credits go to Imran:

def flatten(d, parent_key=''):
    items = []
    for k, v in d.items():
        try:
            items.extend(flatten(v, '%s%s_' % (parent_key, k)).items())
        except AttributeError:
            items.append(('%s%s' % (parent_key, k), v))
    return dict(items)
Davoud Taghawi-Nejad
  • 16,142
  • 12
  • 62
  • 82
  • 2
    If `d` is not a `dict` but a custom mapping type that doesn't implement `items`, your function would fail right then and there. So, it it does not work for every mapping type but only those that implement `items()`. – user6037143 Feb 19 '19 at 23:08
  • 1
    @user6037143 have you ever encountered a mapping type that doesn't implement `items`? I'd be curious to see one. – Trey Hunner Apr 17 '19 at 23:35
  • 2
    @user6037143, no you haven't by definition if items is not implemented it's no mapping type. – Davoud Taghawi-Nejad Apr 18 '19 at 20:00
  • 1
    @DavoudTaghawi-Nejad, could you modify this to handle general keys eg tuples which should not be internally flattened. – alancalvitti Jul 03 '19 at 19:48
7

How about a functional and performant solution in Python3.5?

from functools import reduce


def _reducer(items, key, val, pref):
    if isinstance(val, dict):
        return {**items, **flatten(val, pref + key)}
    else:
        return {**items, pref + key: val}

def flatten(d, pref=''):
    return(reduce(
        lambda new_d, kv: _reducer(new_d, *kv, pref), 
        d.items(), 
        {}
    ))

This is even more performant:

def flatten(d, pref=''):
    return(reduce(
        lambda new_d, kv: \
            isinstance(kv[1], dict) and \
            {**new_d, **flatten(kv[1], pref + kv[0])} or \
            {**new_d, pref + kv[0]: kv[1]}, 
        d.items(), 
        {}
    ))

In use:

my_obj = {'a': 1, 'c': {'a': 2, 'b': {'x': 5, 'y': 10}}, 'd': [1, 2, 3]}

print(flatten(my_obj)) 
# {'d': [1, 2, 3], 'cby': 10, 'cbx': 5, 'ca': 2, 'a': 1}
Rotareti
  • 49,483
  • 23
  • 112
  • 108
  • 2
    How about a readable and working solution? ;) Which version did you test this on? I'm Getting "Syntax error" when trying this out in Python 3.4.3. Seems that usage of "**all" is not legit. – Wolkenarchitekt Nov 22 '17 at 12:01
  • I works since Python 3.5. Didn't know it doesn't work with 3.4. You're right this isn't very readable. I updated the answer. Hope it's more readable now. :) – Rotareti Nov 22 '17 at 14:28
  • 1
    Added missing reduce import. Still find the code hard to understand and I think it's a good example why Guido van Rossum himself discouraged the usage of lambda, reduce, filter and map in 2005 already: https://www.artima.com/weblogs/viewpost.jsp?thread=98196 – Wolkenarchitekt Nov 23 '17 at 09:40
  • I agree. Python isn't really designed for *functional programming*. Still I think `reduce` is great in case you need to reduce dictionaries. I updated the answer. Should look a little more pythonic now. – Rotareti Nov 23 '17 at 10:02
7

If you are a fan of pythonic oneliners:

my_dict={'a': 1,'c': {'a': 2,'b': {'x': 5,'y' : 10}},'d': [1, 2, 3]}

list(pd.json_normalize(my_dict).T.to_dict().values())[0]

returns:

{'a': 1, 'c.a': 2, 'c.b.x': 5, 'c.b.y': 10, 'd': [1, 2, 3]}

You can leave the [0] from the end, if you have a list of dictionaries and not just a single dictionary.

csaladenes
  • 1,100
  • 1
  • 11
  • 27
6

My Python 3.3 Solution using generators:

def flattenit(pyobj, keystring=''):
   if type(pyobj) is dict:
     if (type(pyobj) is dict):
         keystring = keystring + "_" if keystring else keystring
         for k in pyobj:
             yield from flattenit(pyobj[k], keystring + k)
     elif (type(pyobj) is list):
         for lelm in pyobj:
             yield from flatten(lelm, keystring)
   else:
      yield keystring, pyobj

my_obj = {'a': 1, 'c': {'a': 2, 'b': {'x': 5, 'y': 10}}, 'd': [1, 2, 3]}

#your flattened dictionary object
flattened={k:v for k,v in flattenit(my_obj)}
print(flattened)

# result: {'c_b_y': 10, 'd': [1, 2, 3], 'c_a': 2, 'a': 1, 'c_b_x': 5}
EBo
  • 355
  • 1
  • 3
  • 17
Atul
  • 61
  • 2
  • 3
  • can you extend to handle any valid key type other than str (including tuple)? Instead of string concatenation, join them in a tuple. – alancalvitti Jun 13 '19 at 21:23
6

Utilizing recursion, keeping it simple and human readable:

def flatten_dict(dictionary, accumulator=None, parent_key=None, separator="."):
    if accumulator is None:
        accumulator = {}

    for k, v in dictionary.items():
        k = f"{parent_key}{separator}{k}" if parent_key else k
        if isinstance(v, dict):
            flatten_dict(dictionary=v, accumulator=accumulator, parent_key=k)
            continue

        accumulator[k] = v

    return accumulator

Call is simple:

new_dict = flatten_dict(dictionary)

or

new_dict = flatten_dict(dictionary, separator="_")

if we want to change the default separator.

A little breakdown:

When the function is first called, it is called only passing the dictionary we want to flatten. The accumulator parameter is here to support recursion, which we see later. So, we instantiate accumulator to an empty dictionary where we will put all of the nested values from the original dictionary.

if accumulator is None:
    accumulator = {}

As we iterate over the dictionary's values, we construct a key for every value. The parent_key argument will be None for the first call, while for every nested dictionary, it will contain the key pointing to it, so we prepend that key.

k = f"{parent_key}{separator}{k}" if parent_key else k

In case the value v the key k is pointing to is a dictionary, the function calls itself, passing the nested dictionary, the accumulator (which is passed by reference, so all changes done to it are done on the same instance) and the key k so that we can construct the concatenated key. Notice the continue statement. We want to skip the next line, outside of the if block, so that the nested dictionary doesn't end up in the accumulator under key k.

if isinstance(v, dict):
    flatten_dict(dict=v, accumulator=accumulator, parent_key=k)
    continue

So, what do we do in case the value v is not a dictionary? Just put it unchanged inside the accumulator.

accumulator[k] = v

Once we're done we just return the accumulator, leaving the original dictionary argument untouched.

NOTE

This will work only with dictionaries that have strings as keys. It will work with hashable objects implementing the __repr__ method, but will yield unwanted results.

Sidak
  • 187
  • 1
  • 10
Jakov
  • 720
  • 1
  • 12
  • 29
4

Simple function to flatten nested dictionaries. For Python 3, replace .iteritems() with .items()

def flatten_dict(init_dict):
    res_dict = {}
    if type(init_dict) is not dict:
        return res_dict

    for k, v in init_dict.iteritems():
        if type(v) == dict:
            res_dict.update(flatten_dict(v))
        else:
            res_dict[k] = v

    return res_dict

The idea/requirement was: Get flat dictionaries with no keeping parent keys.

Example of usage:

dd = {'a': 3, 
      'b': {'c': 4, 'd': 5}, 
      'e': {'f': 
                 {'g': 1, 'h': 2}
           }, 
      'i': 9,
     }

flatten_dict(dd)

>> {'a': 3, 'c': 4, 'd': 5, 'g': 1, 'h': 2, 'i': 9}

Keeping parent keys is simple as well.

H.Scheidl
  • 815
  • 3
  • 11
  • 25
Ivy Growing
  • 2,688
  • 2
  • 17
  • 23
4

I was thinking of a subclass of UserDict to automagically flat the keys.

class FlatDict(UserDict):
    def __init__(self, *args, separator='.', **kwargs):
        self.separator = separator
        super().__init__(*args, **kwargs)

    def __setitem__(self, key, value):
        if isinstance(value, dict):
            for k1, v1 in FlatDict(value, separator=self.separator).items():
                super().__setitem__(f"{key}{self.separator}{k1}", v1)
        else:
            super().__setitem__(key, value)

‌ The advantages it that keys can be added on the fly, or using standard dict instanciation, without surprise:

>>> fd = FlatDict(
...    {
...        'person': {
...            'sexe': 'male', 
...            'name': {
...                'first': 'jacques',
...                'last': 'dupond'
...            }
...        }
...    }
... )
>>> fd
{'person.sexe': 'male', 'person.name.first': 'jacques', 'person.name.last': 'dupond'}
>>> fd['person'] = {'name': {'nickname': 'Bob'}}
>>> fd
{'person.sexe': 'male', 'person.name.first': 'jacques', 'person.name.last': 'dupond', 'person.name.nickname': 'Bob'}
>>> fd['person.name'] = {'civility': 'Dr'}
>>> fd
{'person.sexe': 'male', 'person.name.first': 'jacques', 'person.name.last': 'dupond', 'person.name.nickname': 'Bob', 'person.name.civility': 'Dr'}
loutre
  • 874
  • 8
  • 16
  • 1
    Assigning to fd['person'] but maintaining its existing value is quite surprising. That's not how regular dicts work. – tbm Mar 23 '20 at 15:53
3

This is similar to both imran's and ralu's answer. It does not use a generator, but instead employs recursion with a closure:

def flatten_dict(d, separator='_'):
  final = {}
  def _flatten_dict(obj, parent_keys=[]):
    for k, v in obj.iteritems():
      if isinstance(v, dict):
        _flatten_dict(v, parent_keys + [k])
      else:
        key = separator.join(parent_keys + [k])
        final[key] = v
  _flatten_dict(d)
  return final

>>> print flatten_dict({'a': 1, 'c': {'a': 2, 'b': {'x': 5, 'y' : 10}}, 'd': [1, 2, 3]})
{'a': 1, 'c_a': 2, 'c_b_x': 5, 'd': [1, 2, 3], 'c_b_y': 10}
Jonathan Drake
  • 270
  • 1
  • 4
  • 13
  • 2
    I am not sure if using the term "[closure](http://en.wikipedia.org/wiki/Closure_(computer_science))" is correct here, as the function `_flatten_dict` is never returned, nor is it expected to ever be returned. It can perhaps be referred to as a _subfunction_ or an _enclosed function_ instead. – Asclepius Dec 21 '12 at 10:59
3

The answers above work really well. Just thought I'd add the unflatten function that I wrote:

def unflatten(d):
    ud = {}
    for k, v in d.items():
        context = ud
        for sub_key in k.split('_')[:-1]:
            if sub_key not in context:
                context[sub_key] = {}
            context = context[sub_key]
        context[k.split('_')[-1]] = v
    return ud

Note: This doesn't account for '_' already present in keys, much like the flatten counterparts.

tarequeh
  • 1,799
  • 18
  • 18
3

Davoud's solution is very nice but doesn't give satisfactory results when the nested dict also contains lists of dicts, but his code be adapted for that case:

def flatten_dict(d):
    items = []
    for k, v in d.items():
        try:
            if (type(v)==type([])): 
                for l in v: items.extend(flatten_dict(l).items())
            else: 
                items.extend(flatten_dict(v).items())
        except AttributeError:
            items.append((k, v))
    return dict(items)
3
def flatten(unflattened_dict, separator='_'):
    flattened_dict = {}

    for k, v in unflattened_dict.items():
        if isinstance(v, dict):
            sub_flattened_dict = flatten(v, separator)
            for k2, v2 in sub_flattened_dict.items():
                flattened_dict[k + separator + k2] = v2
        else:
            flattened_dict[k] = v

    return flattened_dict
MarredCheese
  • 17,541
  • 8
  • 92
  • 91
Pari Rajaram
  • 422
  • 3
  • 7
3

I actually wrote a package called cherrypicker recently to deal with this exact sort of thing since I had to do it so often!

I think the following code would give you exactly what you're after:

from cherrypicker import CherryPicker

dct = {
    'a': 1,
    'c': {
        'a': 2,
        'b': {
            'x': 5,
            'y' : 10
        }
    },
    'd': [1, 2, 3]
}

picker = CherryPicker(dct)
picker.flatten().get()

You can install the package with:

pip install cherrypicker

...and there's more docs and guidance at https://cherrypicker.readthedocs.io.

Other methods may be faster, but the priority of this package is to make such tasks easy. If you do have a large list of objects to flatten though, you can also tell CherryPicker to use parallel processing to speed things up.

big-o
  • 445
  • 4
  • 7
3

Using generators:

def flat_dic_helper(prepand,d):
    if len(prepand) > 0:
        prepand = prepand + "_"
    for k in d:
        i = d[k]
        if isinstance(i, dict):
            r = flat_dic_helper(prepand + k,i)
            for j in r:
                yield j
        else:
            yield (prepand + k,i)

def flat_dic(d):
    return dict(flat_dic_helper("",d))

d = {'a': 1, 'c': {'a': 2, 'b': {'x': 5, 'y' : 10}}, 'd': [1, 2, 3]}
print(flat_dic(d))


>> {'a': 1, 'c_a': 2, 'c_b_x': 5, 'd': [1, 2, 3], 'c_b_y': 10}
LMCuber
  • 113
  • 1
  • 11
Luka Rahne
  • 10,336
  • 3
  • 34
  • 56
  • 3
    `type(i).__name__=='dict'` could be replaced with `type(i) is dict` or perhaps even better `isinstance(d, dict)` (or `Mapping`/`MutableMapping`). – Cristian Ciupitu Jun 27 '14 at 18:21
3

here's a solution using a stack. No recursion.

def flatten_nested_dict(nested):
    stack = list(nested.items())
    ans = {}
    while stack:
        key, val = stack.pop()
        if isinstance(val, dict):
            for sub_key, sub_val in val.items():
                stack.append((f"{key}_{sub_key}", sub_val))
        else:
            ans[key] = val
    return ans
Matias Thayer
  • 571
  • 3
  • 8
2

Here's an algorithm for elegant, in-place replacement. Tested with Python 2.7 and Python 3.5. Using the dot character as a separator.

def flatten_json(json):
    if type(json) == dict:
        for k, v in list(json.items()):
            if type(v) == dict:
                flatten_json(v)
                json.pop(k)
                for k2, v2 in v.items():
                    json[k+"."+k2] = v2

Example:

d = {'a': {'b': 'c'}}                   
flatten_json(d)
print(d)
unflatten_json(d)
print(d)

Output:

{'a.b': 'c'}
{'a': {'b': 'c'}}

I published this code here along with the matching unflatten_json function.

Alexander Ryzhov
  • 2,705
  • 3
  • 19
  • 19
2

If you want to flat nested dictionary and want all unique keys list then here is the solution:

def flat_dict_return_unique_key(data, unique_keys=set()):
    if isinstance(data, dict):
        [unique_keys.add(i) for i in data.keys()]
        for each_v in data.values():
            if isinstance(each_v, dict):
                flat_dict_return_unique_key(each_v, unique_keys)
    return list(set(unique_keys))
Ranvijay Sachan
  • 2,407
  • 3
  • 30
  • 49
2

I always prefer access dict objects via .items(), so for flattening dicts I use the following recursive generator flat_items(d). If you like to have dict again, simply wrap it like this: flat = dict(flat_items(d))

def flat_items(d, key_separator='.'):
    """
    Flattens the dictionary containing other dictionaries like here: https://stackoverflow.com/questions/6027558/flatten-nested-python-dictionaries-compressing-keys

    >>> example = {'a': 1, 'c': {'a': 2, 'b': {'x': 5, 'y' : 10}}, 'd': [1, 2, 3]}
    >>> flat = dict(flat_items(example, key_separator='_'))
    >>> assert flat['c_b_y'] == 10
    """
    for k, v in d.items():
        if type(v) is dict:
            for k1, v1 in flat_items(v, key_separator=key_separator):
                yield key_separator.join((k, k1)), v1
        else:
            yield k, v
Vladimir Ignatev
  • 2,054
  • 1
  • 20
  • 34
2
def flatten_nested_dict(_dict, _str=''):
    '''
    recursive function to flatten a nested dictionary json
    '''
    ret_dict = {}
    for k, v in _dict.items():
        if isinstance(v, dict):
            ret_dict.update(flatten_nested_dict(v, _str = '_'.join([_str, k]).strip('_')))
        elif isinstance(v, list):
            for index, item in enumerate(v):
                if isinstance(item, dict):
                    ret_dict.update(flatten_nested_dict(item,  _str= '_'.join([_str, k, str(index)]).strip('_')))
                else:
                    ret_dict['_'.join([_str, k, str(index)]).strip('_')] = item
        else:
            ret_dict['_'.join([_str, k]).strip('_')] = v
    return ret_dict
Pradeep Pathak
  • 444
  • 5
  • 6
1

Using dict.popitem() in straightforward nested-list-like recursion:

def flatten(d):
    if d == {}:
        return d
    else:
        k,v = d.popitem()
        if (dict != type(v)):
            return {k:v, **flatten(d)}
        else:
            flat_kv = flatten(v)
            for k1 in list(flat_kv.keys()):
                flat_kv[k + '_' + k1] = flat_kv[k1]
                del flat_kv[k1]
            return {**flat_kv, **flatten(d)}
FredAKA
  • 1,248
  • 9
  • 8
1

If you do not mind recursive functions, here is a solution. I have also taken the liberty to include an exclusion-parameter in case there are one or more values you wish to maintain.

Code:

def flatten_dict(dictionary, exclude = [], delimiter ='_'):
    flat_dict = dict()
    for key, value in dictionary.items():
        if isinstance(value, dict) and key not in exclude:
            flatten_value_dict = flatten_dict(value, exclude, delimiter)
            for k, v in flatten_value_dict.items():
                flat_dict[f"{key}{delimiter}{k}"] = v
        else:
            flat_dict[key] = value
    return flat_dict

Usage:

d = {'a':1, 'b':[1, 2], 'c':3, 'd':{'a':4, 'b':{'a':7, 'b':8}, 'c':6}, 'e':{'a':1,'b':2}}
flat_d = flatten_dict(dictionary=d, exclude=['e'], delimiter='.')
print(flat_d)

Output:

{'a': 1, 'b': [1, 2], 'c': 3, 'd.a': 4, 'd.b.a': 7, 'd.b.b': 8, 'd.c': 6, 'e': {'a': 1, 'b': 2}}
Thomas Angeland
  • 441
  • 5
  • 10
1

I tried some of the solutions on this page - though not all - but those I tried failed to handle the nested list of dict.

Consider a dict like this:

d = {
        'owner': {
            'name': {'first_name': 'Steven', 'last_name': 'Smith'},
            'lottery_nums': [1, 2, 3, 'four', '11', None],
            'address': {},
            'tuple': (1, 2, 'three'),
            'tuple_with_dict': (1, 2, 'three', {'is_valid': False}),
            'set': {1, 2, 3, 4, 'five'},
            'children': [
                {'name': {'first_name': 'Jessica',
                          'last_name': 'Smith', },
                 'children': []
                 },
                {'name': {'first_name': 'George',
                          'last_name': 'Smith'},
                 'children': []
                 }
            ]
        }
    }

Here's my makeshift solution:

def flatten_dict(input_node: dict, key_: str = '', output_dict: dict = {}):
    if isinstance(input_node, dict):
        for key, val in input_node.items():
            new_key = f"{key_}.{key}" if key_ else f"{key}"
            flatten_dict(val, new_key, output_dict)
    elif isinstance(input_node, list):
        for idx, item in enumerate(input_node):
            flatten_dict(item, f"{key_}.{idx}", output_dict)
    else:
        output_dict[key_] = input_node
    return output_dict

which produces:

{
  owner.name.first_name: Steven,
  owner.name.last_name: Smith,
  owner.lottery_nums.0: 1,
  owner.lottery_nums.1: 2,
  owner.lottery_nums.2: 3,
  owner.lottery_nums.3: four,
  owner.lottery_nums.4: 11,
  owner.lottery_nums.5: None,
  owner.tuple: (1, 2, 'three'),
  owner.tuple_with_dict: (1, 2, 'three', {'is_valid': False}),
  owner.set: {1, 2, 3, 4, 'five'},
  owner.children.0.name.first_name: Jessica,
  owner.children.0.name.last_name: Smith,
  owner.children.1.name.first_name: George,
  owner.children.1.name.last_name: Smith,
}

A makeshift solution and it's not perfect.
NOTE:

  • it doesn't keep empty dicts such as the address: {} k/v pair.

  • it won't flatten dicts in nested tuples - though it would be easy to add using the fact that python tuples act similar to lists.

Gergely M
  • 583
  • 4
  • 11
1

Using flatdict library:

dic={'a': 1,
 'c': {'a': 2,
       'b': {'x': 5,
             'y' : 10}},
 'd': [1, 2, 3]}

import flatdict
f =  flatdict.FlatDict(dic,delimiter='_')
print(f)
#output
{'a': 1, 'c_a': 2, 'c_b_x': 5, 'c_b_y': 10, 'd': [1, 2, 3]}
Talha Tayyab
  • 8,111
  • 25
  • 27
  • 44
0

Variation of this Flatten nested dictionaries, compressing keys with max_level and custom reducer.

  def flatten(d, max_level=None, reducer='tuple'):
      if reducer == 'tuple':
          reducer_seed = tuple()
          reducer_func = lambda x, y: (*x, y)
      else:
          raise ValueError(f'Unknown reducer: {reducer}')

      def impl(d, pref, level):
        return reduce(
            lambda new_d, kv:
                (max_level is None or level < max_level)
                and isinstance(kv[1], dict)
                and {**new_d, **impl(kv[1], reducer_func(pref, kv[0]), level + 1)}
                or {**new_d, reducer_func(pref, kv[0]): kv[1]},
                d.items(),
            {}
        )

      return impl(d, reducer_seed, 0)
user2528473
  • 321
  • 2
  • 5
  • 9
0

You can use recursion in order to flatten your dictionary.

import collections


def flatten(
    nested_dict,
    seperator='.',
    name=None,
):
    flatten_dict = {}

    if not nested_dict:
        return flatten_dict

    if isinstance(
        nested_dict,
        collections.abc.MutableMapping,
    ):
        for key, value in nested_dict.items():
            if name is not None:
                flatten_dict.update(
                    flatten(
                        nested_dict=value,
                        seperator=seperator,
                        name=f'{name}{seperator}{key}',
                    ),
                )
            else:
                flatten_dict.update(
                    flatten(
                        nested_dict=value,
                        seperator=seperator,
                        name=key,
                    ),
                )
    else:
        flatten_dict[name] = nested_dict

    return flatten_dict


if __name__ == '__main__':
    nested_dict = {
        1: 'a',
        2: {
            3: 'c',
            4: {
                5: 'e',
            },
            6: [1, 2, 3, 4, 5, ],
        },
    }

    print(
        flatten(
            nested_dict=nested_dict,
        ),
    )

Output:

{
   "1":"a",
   "2.3":"c",
   "2.4.5":"e",
   "2.6":[1, 2, 3, 4, 5]
}
Alon Barad
  • 1,491
  • 1
  • 13
  • 26
0
def flatten(dictionary, prefix = '', separator = '_'):
    out_dict = {}
    if type(dictionary) != dict:
        out_dict[prefix] = dictionary
        return out_dict
    elif dictionary is None:
        return None
    for k in dictionary.keys():
        if prefix:
            prefix_n = prefix + f'{separator}{k}'
        else:
            prefix_n = k
        out_dict.update(flatten_new(dictionary[k], prefix_n))
    return out_dict

Output:

{'a': 1, 'c_a': 2, 'c_b_x': 5, 'c_b_y': 10, 'd': [1, 2, 3]}
gregoruar
  • 345
  • 3
  • 14