Hi I want to create a DataFrame
from a list of dicts
where the items are lists. When the items are scalars, see test
below, the call to pd.DataFrame
works as expected:
test = [{'points': 40, 'time': '5:00', 'year': 2010},
{'points': 25, 'time': '6:00', 'month': "february"},
{'points':90, 'time': '9:00', 'month': 'january'},
{'points_h1':20, 'month': 'june'}]
pd.DataFrame(test)
month points points_h1 time year
0 NaN 40.0 NaN 5:00 2010.0
1 february 25.0 NaN 6:00 NaN
2 january 90.0 NaN 9:00 NaN
3 june NaN 20.0 NaN NaN
However, if the items are lists themselves, I get what seems to be an unexpected result:
test = [{'points': [40, 50], 'time': ['5:00', '4:00'], 'year': [2010, 2011]},
{'points': [25], 'time': ['6:00'], 'month': ["february"]},
{'points':[90], 'time': ['9:00'], 'month': ['january']},
{'points_h1': [20], 'month': ['june']}]
pd.DataFrame(test)
month points points_h1 time year
0 NaN [40, 50] NaN [5:00, 4:00] [2010, 2011]
1 february 25 NaN 6:00 NaN
2 january 90 NaN 9:00 NaN
3 june NaN 20.0 NaN NaN
To solve this, I use: pd.concat([pd.DataFrame(z) for z in test])
, but this is relatively slow because you have to create a new dataframe for each element in the list, which requires significant overhead. Am I missing something?