Flattening json documents with multiple occurences stacked

Question

i would like to transform some json file or dict object in a table ready for sql queries.

I have already found several great topics but none of them seem to answer my question:
Python flatten multilevel/nested JSON
Flatten nested dictionaries, compressing keys

In my case i have json files which stack every occurence with the same structure in a list like this:

[{
    "_id": "some_id_value",
    "time": {
        "$date": "2019-12-28"
    },
    "boolean_key": false,
    "data_stacked": [{
        "name": "task1",
        "time": {
            "$date": "2019-12-28"
        },
        "time_spent": 2.2,
        "size": {
            "$number": "0"
        },
        "other": {}
    }, {

        "name": "task2",
        "time": {
            "$date": "2019-12-28"
        },
        "time_spent": 2.7,
        "size": {
            "$number": "0"
        },
        "other": {}

    }]

},
{
    "_id": "some_other_id_value",
    "time": {
        "$date": "2019-12-30"
    },
    "boolean_key": true,
    "data_stacked": [{
        "name": "task3",
        "time": {
            "$date": "2019-12-30"
        },
        "time_spent": 3.4,
        "size": {
            "$number": "0"
        },
        "other": {}
    }, {

        "name": "task4",
        "time": {
            "$date": "2019-12-30"
        },
        "time_spent": 4.1,
        "size": {
            "$number": "0"
        },
        "other": {}

    }]




}]

and i would like to iterate on the first keys which represent standard attribute

[       
    {
    "_id": "some_id_value",
    "time.$date": "2019-12-28"
    "boolean_key": false,
    "data_stacked.name": "task1",
    "data_stacked.time.$date": "2019-12-28"
    "data_stacked.time_spent": 2.2,
    "data_stacked.size.$number": "0"
    "data_stacked.other": {}
    }, 
    {
    "_id": "some_id_value",
    "time.$date": "2019-12-28"
    "boolean_key": false,
    "data_stacked.name": "task2",
    "data_stacked.time.$date": "2019-12-28"
    "data_stacked.time_spent": 2.7,
    "data_stacked.size.$number": "0"
    "data_stacked.other": {}
    },
    {

    "_id": "some_other_id_value",
    "time.$date": "2019-12-30"
    "boolean_key": true,
    "data_stacked.name": "task3",
    "data_stacked.time.$date": "2019-12-30"
    "data_stacked.time_spent": 3.4,
    "data_stacked.size.$number": "0"
    "data_stacked.other": {}
    }, 
    {
    "_id": "some_other_id_value",
    "time.$date": "2019-12-30"
    "boolean_key": true,
    "data_stacked.name": "task4",
    "data_stacked.time.$date": "2019-12-30"
    "data_stacked.time_spent": 4.1,
    "data_stacked.size.$number": "0"
    "data_stacked.other": {}

    }]

I have found a similar case here: https: How to flatten up embedded JSON into multiple documents
But i didn't succeed to run this example, i have never used js before so i don't know what is missing.

On my last attempt, i tried to loop around keys:

tryjson = json.load(testjson)
def flatten_json2(y):
out = {}

def flatten(x, name=''):
    if type(x) is dict:
        for a in x:
            flatten(x[a], name + a + '_')
    elif type(x) is list:
        i = 0
        for a in x:
            flatten(a, name + str(i) + '_')
            i += 1
    else:
        out[name[:-1]] = x

flatten(y)
return out
   import pprint
for i in tryjson:
test=(i['_id'] ,flatten_json2(i['data_stacked']))
print(test)

the result isn't good, _id key is missing because i looked for _id value only (i don't know how to return key and value for a specific field), data stacked seems to be sorted correctly but the whole object is a tuple so i can't use json_normalize either

Do you have a solution for this issue?
best regards.

score 0 · Answer 1 · answered Jul 30 '20 at 09:54

I think json_normalize should be enough if i start from mongodb query result, something like $unwind:"data_stacked,time" and

 $project: {
        _id: 1,
        "time.$date": 1,
        "data_stacked.name": 1,
        ...
    }

Maybe $group instead of $project since i would like to return _id in each object? anyway i can't test this option for now, so i'm still looking for a solution. :)

Flattening json documents with multiple occurences stacked

1 Answers1