TL;DR:
df = pd.json_normalize(my_dict, sep='_')
df.columns = pd.MultiIndex.from_arrays(zip(*df.columns.str.split('_')))
df = df.stack(level=0).droplevel(0)
We can use pd.json_normalize
. Since it joins the column names at different depths with .
s, one can split the column names and create a pd.MultiIndex
from the resulting tuples:
>>> df = pd.json_normalize(my_dict)
>>> tuple_cols = df.columns.str.split('.')
>>> df.columns = pd.MultiIndex.from_tuples(tuple(i) for i in tuple_cols)
We can also transpose with zip
and use from_arrays
:
>>> df = pd.json_normalize(my_dict)
>>> df.columns = pd.MultiIndex.from_arrays(zip(*df.columns.str.split('.')))
Either way, df
becomes the following:
version1
perl C
line_covered line_total func_covered func_total line_covered line_total func_covered func_total
0 207 312 15 18 321 512 10 10
Because you seem to consider your keys concatenated with _
, we may use that too:
>>> df = pd.json_normalize(my_dict, sep='_')
>>> df.columns = pd.MultiIndex.from_arrays(zip(*df.columns.str.split('_')))
>>> df
version1
perl C
line func line func
covered total covered total covered total covered total
0 207 312 15 18 321 512 10 10
But still, as far as I understand, the first level of your dictionary is actually the row name. So let's stack the first column level and drop the old index (I'm creating a second entry for clarity):
>>> my_dict['version2'] = my_dict['version1']
>>> df = pd.json_normalize(my_dict, sep='_')
>>> df.columns = pd.MultiIndex.from_arrays(zip(*df.columns.str.split('_')))
>>> df
version1 version2
perl C perl C
line func line func line func line func
total covered total covered total covered total covered total covered total covered
0 312 15 18 321 512 10 312 15 18 321 512 10
>>> df = df.stack(level=0).droplevel(0)
And you're good to go:
C perl
func line func line
covered covered total covered total total
version1 10 321 512 15 18 312
version2 10 321 512 15 18 312
Tip: If your JSON doesn't have all levels occupied, i.e. the actual values are in different depths, you may consider using itertools.zip_longest
:
from itertools import zip_longest
df = pd.json_normalize(my_dict)
tuple_cols = df.columns.str.split('.')
df.columns = pd.MultiIndex.from_arrays(zip_longest(*tuple_cols))