pandas has explode
to unwrap lists but I am not aware of a method for dictionaries.
As your dictionary is extremely well structured, you can try
[28]: pd.Series(d).apply(pd.Series).stack().apply(pd.Series).stack().apply(pd
...: .Series).stack().explode().apply(pd.Series).reset_index().rename(column
...: s={'level_0': 'teamId', 'level_1': 'matchPeriod', 'level_2': 'eventSec'
...: , 'level_3': 'type'})
Out[28]:
teamId matchPeriod eventSec type playerId
0 7453 2H 1155 in 281253
1 7453 2H 1155 in 169212
2 7453 2H 1155 out 449240
3 7453 2H 1155 out 257943
4 7453 2H 2011 in 449089
.. ... ... ... ... ...
11 7454 2H 1627 out 56386
12 7454 2H 2725 in 56108
13 7454 2H 2725 out 56383
14 7454 1H 2833 in 56390
15 7454 1H 2833 out 208089
Although extremely ugly chaining the Series
constructor and stack
will build up the DataFrame level by level.
Update:
In principle you can pass a dictionary to the DataFrame
and Series
constructors
In [2]: d
Out[2]:
{'7453': {'2H': {'1155': {'in': [{'playerId': 281253}, {'playerId': 169212}],
'out': [{'playerId': 449240}, {'playerId': 257943}]},
'2011': {'in': [{'playerId': 449089}], 'out': [{'playerId': 69374}]},
'2568': {'in': [{'playerId': 481900}], 'out': [{'playerId': 1735}]}}},
'7454': {'1H': {'2833': {'in': [{'playerId': 56390}],
'out': [{'playerId': 208089}]}},
'2H': {'687': {'in': [{'playerId': 574}], 'out': [{'playerId': 578855}]},
'1627': {'in': [{'playerId': 477400}], 'out': [{'playerId': 56386}]},
'2725': {'in': [{'playerId': 56108}], 'out': [{'playerId': 56383}]}}}}
In [3]: pd.DataFrame(d)
Out[3]:
7453 7454
2H {'1155': {'in': [{'pl... {'687': {'in': [{'pla...
1H NaN {'2833': {'in': [{'pl...
In [4]: pd.Series(d)
Out[4]:
7453 {'2H': {'1155': {'in'...
7454 {'1H': {'2833': {'in'...
dtype: object
As they are 2-dimensional and 1-dimensional data structures respectively, they also expect a dictionary with 2 and 1 level deep nesting respectively. The DataFrame
interprets your 'teamId' as index and 'matchPeriod' as columns and the values are the values of the dictionaries like in
In [5]: d['7453']['2H']
Out[5]:
{'1155': {'in': [{'playerId': 281253}, {'playerId': 169212}],
'out': [{'playerId': 449240}, {'playerId': 257943}]},
'2011': {'in': [{'playerId': 449089}], 'out': [{'playerId': 69374}]},
'2568': {'in': [{'playerId': 481900}], 'out': [{'playerId': 1735}]}}
The Series
behaves the same way, but with only one level.
In [6]: d['7453']
Out[6]:
{'2H': {'1155': {'in': [{'playerId': 281253}, {'playerId': 169212}],
'out': [{'playerId': 449240}, {'playerId': 257943}]},
'2011': {'in': [{'playerId': 449089}], 'out': [{'playerId': 69374}]},
'2568': {'in': [{'playerId': 481900}], 'out': [{'playerId': 1735}]}}}
is your first level. Now this is a dictionary again, so you can pass it the the Series
constructor as well
In [7]: pd.Series(d['7453'])
Out[7]:
2H {'1155': {'in': [{'pl...
dtype: object
The apply
function allows you to do this for every row of the Series
In [8]: pd.Series(d).apply(pd.Series)
Out[8]:
2H 1H
7453 {'1155': {'in': [{'pl... NaN
7454 {'687': {'in': [{'pla... {'2833': {'in': [{'pl...
Now you arrive at the same result as with the DataFrame
constructor. This is called broadcasting. Each value of the original Series
no becomes its own Series
and the index is used as column labels. By calling stack
you intead tell pandas to give you a series intead and stack all the labels to a MultiIndex
if needed.
In [9]: pd.Series(d).apply(pd.Series).stack()
Out[9]:
7453 2H {'1155': {'in': [{'pl...
7454 2H {'687': {'in': [{'pla...
1H {'2833': {'in': [{'pl...
dtype: object
Now you again have a Series (with a 2d index) where each value is a dictionary which - again - can be passed to the Series
constructor. So if you repeat this chain of apply(pd.Series).stack()
you get
In [10]: pd.Series(d).apply(pd.Series).stack().apply(pd.Series).stack()
Out[10]:
7453 2H 1155 {'in': [{'playerId': ...
2011 {'in': [{'playerId': ...
2568 {'in': [{'playerId': ...
7454 2H 687 {'in': [{'playerId': ...
1627 {'in': [{'playerId': ...
2725 {'in': [{'playerId': ...
1H 2833 {'in': [{'playerId': ...
dtype: object
Now you again have a Series (with a 3d index) where each value is a dictionary which - again - can be passed to the Series
constructor.
In [11]: pd.Series(d).apply(pd.Series).stack().apply(pd.Series).stack().apply(pd.Series).stack()
Out[11]:
7453 2H 1155 in [{'playerId': 281253}...
out [{'playerId': 449240}...
2011 in [{'playerId': 449089}]
out [{'playerId': 69374}]
2568 in [{'playerId': 481900}]
out [{'playerId': 1735}]
7454 2H 687 in [{'playerId': 574}]
out [{'playerId': 578855}]
1627 in [{'playerId': 477400}]
out [{'playerId': 56386}]
2725 in [{'playerId': 56108}]
out [{'playerId': 56383}]
1H 2833 in [{'playerId': 56390}]
out [{'playerId': 208089}]
dtype: object
This is a special case as now your values are no longer dictionaries but lists (with one element each). For lists (and unfortunately not for dictionaries) there is the explode()
method in pandas to create a new row for each list element.
In [13]: pd.Series(d).apply(pd.Series).stack().apply(pd.Series).stack().apply(pd.Series).stack().explode()
Out[13]:
7453 2H 1155 in {'playerId': 281253}
in {'playerId': 169212}
out {'playerId': 449240}
out {'playerId': 257943}
2011 in {'playerId': 449089}
...
7454 2H 1627 out {'playerId': 56386}
2725 in {'playerId': 56108}
out {'playerId': 56383}
1H 2833 in {'playerId': 56390}
out {'playerId': 208089}
dtype: object
unpacks each list. Now you again have a Series (with a 4d index) where each value is a dictionary which - again - can be passed to the Series
constructor.
In [14]: pd.Series(d).apply(pd.Series).stack().apply(pd.Series).stack().apply(pd.Series).stack().explode().apply(pd.Series).stack()
Out[14]:
7453 2H 1155 in playerId 281253
playerId 169212
out playerId 449240
playerId 257943
2011 in playerId 449089
...
7454 2H 1627 out playerId 56386
2725 in playerId 56108
out playerId 56383
1H 2833 in playerId 56390
out playerId 208089
dtype: int64
With these five iterations of applying the Series
constructor to your dictionary and reshaping the data until you can apply it again, you got your dictionary fully unpacked.
In order to match your desired result you can make all levels of the index to a column with reset_index
.
In [15]: pd.Series(d).apply(pd.Series).stack().apply(pd.Series).stack().apply(pd.Series).stack().explode().apply(pd.Series).stack().reset_index()
Out[15]:
level_0 level_1 level_2 level_3 level_4 0
0 7453 2H 1155 in playerId 281253
1 7453 2H 1155 in playerId 169212
2 7453 2H 1155 out playerId 449240
3 7453 2H 1155 out playerId 257943
4 7453 2H 2011 in playerId 449089
.. ... ... ... ... ... ...
11 7454 2H 1627 out playerId 56386
12 7454 2H 2725 in playerId 56108
13 7454 2H 2725 out playerId 56383
14 7454 1H 2833 in playerId 56390
15 7454 1H 2833 out playerId 208089
Neither the Series nor the index levels had names. By default it uses the column number (0
) for the values (which should be 'playerId') and level_0
to level_4
for the index levels.
In order to set these appropriately one way is to rename the Series
before calling reset_index
and rename the levels
with rename
afterwards.
I hope that helps