0

I want to create a CSV file out of a very complex dict. The real dict uses thousands of keys and more than 9 levels of depth, but this is just a mere example of the structure:

import pandas
my_stuff = [
    {
        "a":
            [
                {"1": "example1"},
                {"2": [
                    {"2": "example2"},
                    {"3": "example3"}
                ]},
                {"4": "example4"},
                {"5": "example5"}
            ],
        "b":
            [
                "example6", "61", "62"
            ]
        }
]
result = pandas.json_normalize(my_stuff)
print(result.to_csv())

That prints:

,a,b 0,
"[{'1': 'example1'}, {'2': [{'2': 'example2'}, {'3': 'example3'}]}, {'4': 'example4'}, {'5': 'example5'}]","['example6', '61', '62']"

But I want this output:

"0.a.0.1, 0.a.0.2.2, 0.a.0.2.3, 0.a.0.4, 0.a.0.5, 0b.0"
"example1, example2, example3, example4, example5, example6;61;62"

I though pandas would be able to flatten the dict but seems like it can not. I need the keys to be used as headers like sectiona.subsection1.fieldwhatever because that .csv will be later loaded into a database.

I hope anyone can help.

Bonus: I tried without pandas but got stuck here:

def flatten(py_structure, depth=""):
    """make a flatten dict"""
    new_dict = {}
    if isinstance(py_structure, dict):
        for k, v in py_structure.items():
            if isinstance(v, dict):
                flattened_v = flatten(v, k)
            elif isinstance(v, list):
                flattened_v = flatten(v, k)
            else:
                flattened_v = v
            new_dict[f"{depth}{k}"] = flattened_v
        return new_dict
    elif isinstance(py_structure, list):
        for idx, v in enumerate(py_structure):
            new_dict[f"{depth}{idx}"] = flatten(v, f"{depth}{idx}")
        return new_dict
Saelyth
  • 1,694
  • 2
  • 25
  • 42
  • 1
    Do you try answers from this question? https://stackoverflow.com/questions/6027558/flatten-nested-dictionaries-compressing-keys – ont.rif Apr 19 '21 at 11:15
  • yes, and aside of a deprecation warning... (DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated since Python 3.3, and in 3.9 it will stop working if isinstance(v, collections.MutableMapping):)... the result is not the expected, it created this `{'a': [{'1': 'example1'}, {'2': [{'2': 'example2'}, {'3': 'example3'}]}, {'4': 'example4'}, {'5': 'example5'}], 'b': ['example6', '61', '62']}` Note that I am also dealing with a list of dicts, not just a dict. – Saelyth Apr 19 '21 at 11:29

1 Answers1

0

You can achieve this with depth-first traversal for custom tree container:

import pprint


class Container:
    def __init__(self, data):
        self.is_leaf = False
        if type(data) is list:
            self.data = [Container(x) for x in data]
        elif type(data) is dict:
            self.data = {k: Container(v) for k, v in data.items()}
        else:
            self.is_leaf = True
            self.data = data

    def walk(self, callback):
        self._walk(self, callback=callback, path=[])

    def _walk(self, container, callback=None, path=None):
        if type(container.data) is not dict \
           and all(x.is_leaf for _, x in container.items()):
            callback(".".join(path), [x.data for _, x in container.items()])
        else:
            for k, c in container.items():
                self._walk(c, callback=callback, path=path+[str(k)])

    def items(self):
        if type(self.data) is list:
            yield from enumerate(self.data)
        elif type(self.data) is dict:
            yield from self.data.items()
        else:
            yield None, self

    def flatten(self):
        result = {}

        def callback(key, value):
            result[key] = value

        self.walk(callback)
        return result


data = [
    {
        "a":
            [
                {"1": "example1"},
                {"2": [
                    {"2": "example2"},
                    {"3": "example3"}
                ]},
                {"4": "example4"},
                {"5": "example5"}
            ],
        "b":
            [
                "example6", "61", "62"
            ]
        }
]

c = Container(data)
pprint.pprint(c.flatten())

will outputs:

{'0.a.0.1': ['example1'],
 '0.a.1.2.0.2': ['example2'],
 '0.a.1.2.1.3': ['example3'],
 '0.a.2.4': ['example4'],
 '0.a.3.5': ['example5'],
 '0.b': ['example6', '61', '62']}
ont.rif
  • 1,068
  • 9
  • 18