Combination of Index and MultiIndex in pandas ( Nested JSON representation in dataframe )

Question

I have a nested JSON like this

{ key1: "value1", key2: { key21: { key211: "value211" } }

If I create a MultiIndex from tuples of keys I get

[(key1, nan, nan), (key2, key21, key211)]

As you can see with this I cannot do

df[key1] = 'scalar'

because def[key1] is a dataframe which will result into level mismatch while assignment.

So, how to use dataframe for such JSON ?

What I know so far -

Normalise - I can normalise the data and operate. But I have to convert back to JSON. pandas has no out of the box API for this. So I can loop on rows and make objects. But if multiple rows makes one object in case of any key being a list, I have to group or index then make it. In any case, I would like to avoid looping.

What I am thinking -

Maybe a combination of Index and MultiIndex. So that df[key1] is Series and I can also use to_dict API of pandas DataFrame. But I am not able to figure out how to do this or even if this is possible.
Is there a way to take care of nan in my MultiIndex, so that I can access df[key1] as a series. This way I don't have to worry about how deep my levels go.

UPDATE 1

I still could not figure out to handle this scenario with MultiIndex, so I ended up working with flat/normalised data. So anyone with a deadline, I would recommend working on flat structure and use the utilities given in this question - Inverse of Pandas json_normalize

Combination of Index and MultiIndex in pandas ( Nested JSON representation in dataframe )

0 Answers0