Flatten JSON / Dictionaries / List

Question

I have a nested JSON

{
    "ID": 300,
    "Name": " TEST",
    "Value": [
        {
            "Details": [
                {
                    "Name": "TEST1",
                    "Value": "XXXXXX"
                },
                {
                    "Name": "TEST2",
                    "Value": "DDDDDDDD"
                }
            ],
            "Time": [ 1600358400, 1600358700, 1600359000],
            "Values": [ 0, 0, 0]
        }
    ]
}

I want to flatten the json to be able to get a list like

I used itertools groupby but unable to acheived the desired results. It is flattening horizontally.

Code I have tried so far

from itertools import groupby
import json

def myflatten(d, depth=0):
    rv = [({}, depth)]
    if isinstance(d, dict):
        for k, v in d.items():
            if not isinstance(v, dict) and not isinstance(v, list):
                for i in rv:
                    i[0][k] = v
            else:
                for (vv, _depth) in myflatten(v,depth+1):
                    rv.append((rv[-1][0].copy(), _depth))
                    for kkk, vvv in vv.items():
                        rv[-1][0][kkk] = vvv
    elif isinstance(d, list):
        for v in d:
            rv.append((rv[-1][0].copy(), depth+1))
            for (vv, _) in myflatten(v,depth+1):
                for kkk, vvv in vv.items():
                    rv[-1][0][kkk] = vvv
    for i, _depth in rv:
        yield i, _depth

out = []

a = {
    "ID": 300,
    "Name": " TEST",
    "Value": [
        {
            "Details": [
                {
                    "Name": "TEST1",
                    "Value": "XXXXXX"
                },
                {
                    "Name": "TEST2",
                    "Value": "DDDDDDDD"
                }
            ],
            "Time": [ 1600358400, 1600358700, 1600359000],
            "Values": [ 0, 0, 0]
        }
    ]
}

for v, g in groupby(sorted(myflatten(a), key=lambda k: -k[1]), lambda k: k[1]):
    out.extend(i[0] for i in g)
    break
print(out)

Can someone help to flatten the nested json/dict/list vertically rather than horizontally? The final objective is to be able to store the data in a RDBMS without having to undefinitely increase number of columns but rather rows.

Your code is incorrectly indented. Please fix that. Also, please post what you get right now and narrow down the source of the error by creating a [mre] — Pranav Hosangadi, Sep 18 '20 at 15:25
Getting close to the solution: https://stackoverflow.com/questions/51359783/python-flatten-multilevel-nested-json — nomad, Sep 18 '20 at 15:30
@nomad Please specify what output you want. The image doesn't count since the structure _you_ want cannot be inferred... Do you want it to be a list of lists? — Melvin Abraham, Sep 18 '20 at 15:49
Do you want to flatten in general, or using domain knowledge of the input? What about zip(details, values)? That looks like domain knowledge, although we could just automatically zip any lists below the outermost list. — Kenny Ostrom, Sep 18 '20 at 16:03
@MelvinAbraham: I want to flatten it with a line with the details from each nested list. I do not know if this is clear enough. You can infer that the image is a flatted list with ID that identify the nested elements. — nomad, Sep 18 '20 at 16:09
@KennyOstrom: I want to flatten it so that i can insert the value in a database. I tried the zip [[k] + [';'.join(v) for v in zip(*g)[1:]] for k,g in groupby(a, itemgetter(0))] but could not make it work, basically i want to extract the information into rows and columns that i can insert in a RDBMS. — nomad, Sep 18 '20 at 16:11
I'm asking if we can use the structure already in the json, and know it's always going to be like that, or if we have to generalize. — Kenny Ostrom, Sep 18 '20 at 16:14
@KennyOstrom, generalizing. As a times I may have less nested with other attributes. — nomad, Sep 18 '20 at 16:15
rdb isn't going to like that, but okay. Now, suppose we drill down and find two different fields with lists. How do we know that we zip(details, time, values) but we take the product of value with those zipped records? — Kenny Ostrom, Sep 18 '20 at 16:17
@KennyOstrom, The product of the values should be ok. If some additional processing is required just before inserting the data in the Database, I can manage that . I just have to manage to be able to do a kind of "cartesian product" of all nested list. — nomad, Sep 18 '20 at 16:26
The function described here seems to have the "generalized" approach. But I cannot make it work completely for the moment. https://stackoverflow.com/questions/51359783/python-flatten-multilevel-nested-json — nomad, Sep 18 '20 at 16:29

Kenny Ostrom · Answer 1 · 2020-09-19T16:16:16.683

In order to generalize this, we need to flatten everything, but return iterators to the lists. These iterators we shall then zip together, so they all advance at the same pace. But these iterators might be dicts which themselves must be flattened, remembering the heirarchy they came from.

Additionally, note that the ID, Name, Value of the outer dict are exempt from all this, and not included in the recursive naming scheme, so we shall write a top level handler that parses those, and then sets up the rest. And "Details.ID" is not part of the data, so I hardcoded it, with the id taken from enumerate on the aforementioned zip of lists.

I'll admit, generalizing the flatten function to handle arbitrary nesting of dicts and lists and scalar data was a bit much for me, as we'd need to chain iterators and pick a consistent return type, for several different types of input. Too much ...

Instead, I use the structure of the data to a limited extent, and assume there's only one level which has the lists. I only flatten within the data. That simplifies the problem significantly, while still accepting arbitrary names and values at all levels:

import itertools, json, typing, pprint

# top level function starts recursion
def parse(data):
    fixed_fields = {k:v for k,v in data.items() if not isinstance(v, typing.Iterable)}
    for testcase in data['Value']:
        for testcase in data['Value']:
            for record in parse_testcase(testcase):
                record.update(fixed_fields)
                yield record

def parse_testcase(testcase):
    names = []
    values = []
    for key, value in testcase.items():
        names.append(key)
        values.append(itertools.chain(value))
    
    for details_id, row in enumerate(zip(*values)):
        record = {'Details.ID': details_id}
        for name, value in zip(names, row):
            if isinstance(value, dict):
                flatten(name, value, record)
            else:
                record[name] = value
        yield record
        
def flatten(parent_key, details, result):
    for key, value in details.items():
        keyname = get_keyname(parent_key, key)
        if isinstance(value, dict):
            flatten(keyname, value, result)
        else:
            result[keyname] = value

def get_keyname(parent_key, key):
    if parent_key:
        return '.'.join((parent_key, key))
    return key

text = """{
    "ID": 300,
    "Name": " TEST",
    "Value": [
        {
            "Details": [
                {
                    "Name": "TEST1",
                    "Value": "XXXXXX"
                },
                {
                    "Name": "TEST2",
                    "Value": "DDDDDDDD"
                }
            ],
            "Time": [ 1600358400, 1600358700, 1600359000],
            "Values": [ 0, 0, 0]
        }
    ]
}"""

for record in parse(json.loads(text)):
    pprint.pprint(record)

Since you have 2 Details records, and 3 of the other records, I'm forced to suspect that your expected output is wrong. The only parsing rule I could make sense of was to advance all the lists in sync, and therefore drop the extra time and values measurements. If you can help me understand the parsing rules so I understand how you got four rows, we can revisit this. I got:

{'Details.ID': 0,
'Details.Name': 'TEST1',
'Details.Value': 'XXXXXX',
'ID': 300,
'Time': 1600358400,
'Values': 0}

{'Details.ID': 1,
'Details.Name': 'TEST2',
'Details.Value': 'DDDDDDDD',
'ID': 300,
'Time': 1600358700,
'Values': 0}

I also note that your output has an arbitrary scheme determined by the input. Perhaps it would be wise to output a csv file, and make fitting it into a relational database be a separate task.

If the correct output should have four rows, just let me know, and I can update or delete this answer. I don't understand how you got that. If it's an outer join of several tables, perhaps we should be parsing those tables individually. — Kenny Ostrom, Sep 19 '20 at 16:19
Since we have multiple "Time", we should have as much result. But I understand your idea. — nomad, Sep 21 '20 at 17:21

score 0 · Answer 2 · answered Sep 18 '20 at 16:24

Since in your case, you need to store the rows (or records) in a database, you can have a function that generates a list of dictionaries which you can add to the database iteratively. Also, since your code seems to be too nested, you might consider the following snippet of code:

def flatten(dictionary):
    my_list = []
    _id = dictionary["id"]
    name = dictionary["Name"]

    for obj in dictionary["Value"]:
        details = obj["Details"]
        time = obj["Time"]
        vals = obj["Values"]

        for i in range(len(time)):
            for (index, detail_obj) in enumerate(details):
                my_list.append({
                    "ID": _id,
                    "Name": name,
                    "Details.ID": index,
                    "Details.Name": detail_obj["Name"]
                    "Details.Value": detail_obj["Value"],
                    "Time.ID": i,
                    "Time.Time": time[i],
                    "Time.Value": vals[i]
                })

    return my_list

NOTE: This function will process a single nested dictionary (like the one you provided in the question) at a time. So, if you have multiple nested dictionary like so, you might want to call this function for each of these nested dictionary.

Thanks, I can give this a try for the moment. But this wait i have to always have the same keys and same nested list in my json. In my case it can vary. — nomad, Sep 18 '20 at 16:28
I am trying to use the function provided here. I still have few issues but this is quite close to what i need. https://stackoverflow.com/questions/51359783/python-flatten-multilevel-nested-json https://stackoverflow.com/questions/51359783/python-flatten-multilevel-nested-json — nomad, Sep 18 '20 at 16:28
I can have another key with nested value just like the details key. I can still use your code to make it work, but at every time i will have a new nested key, i can to manually update the code — nomad, Sep 18 '20 at 16:35

Flatten JSON / Dictionaries / List

2 Answers2