I have 14,000 dataframes with 30,000 rows each. An example df looks like this
urban restricted speed_bin
True False 0-2.5
True False 0-2.5
True False 0-2.5
True True 0-2.5
True False 2.5-4.5
I need to aggregate the data over all the dataframes, but can not hold it all in memory. I would like to count the unique row combinations for each dataframe. For a single dataframe, I can do
df.groupby(['urban','restricted','speed_bin']).size().reset_index() \
.rename(columns={0:'count'})
urban restricted speed_bin count
0 False False 0-2.5 45
1 False False 2.5-7.5 12
2 False False 7.5-12.5 16
3 False False 12.5-17.5 20
4 False False 17.5-22.5 4
How can I transpose this and create column names out of the speed_bin
combinations while making the count column the row content, and leaving urban
and speed_bin
columns? Keep in mind that a dataframe might not have an instance of a possible combination another dataframe might have.
So, in the end I would have four rows corresponding to the combinations of urban and restricted with the corresponding speed bins counts.
Thanks in advance!