2

I have 14,000 dataframes with 30,000 rows each. An example df looks like this urban restricted speed_bin True False 0-2.5 True False 0-2.5 True False 0-2.5 True True 0-2.5 True False 2.5-4.5

I need to aggregate the data over all the dataframes, but can not hold it all in memory. I would like to count the unique row combinations for each dataframe. For a single dataframe, I can do

df.groupby(['urban','restricted','speed_bin']).size().reset_index() \
                                               .rename(columns={0:'count'})

urban   restricted  speed_bin   count
0   False   False   0-2.5       45
1   False   False   2.5-7.5     12
2   False   False   7.5-12.5    16
3   False   False   12.5-17.5   20
4   False   False   17.5-22.5   4

How can I transpose this and create column names out of the speed_bin combinations while making the count column the row content, and leaving urban and speed_bin columns? Keep in mind that a dataframe might not have an instance of a possible combination another dataframe might have.

So, in the end I would have four rows corresponding to the combinations of urban and restricted with the corresponding speed bins counts.

Thanks in advance!

leonard
  • 795
  • 1
  • 9
  • 18
  • What do you mean "and leaving urban and speed_bin columns"? That seems to be at odds with the last sentence. – Alex Feb 09 '18 at 00:31

2 Answers2

0
df.set_index(['urban', 'restricted', 'speed_bin'])['count'].unstack()

returns

speed_bin         0-2.5  12.5-17.5  17.5-22.5  2.5-7.5  7.5-12.5
urban restricted                                                
False False          45         20          4       12        16
Alex
  • 18,484
  • 8
  • 60
  • 80
0

One way is:

df.pivot_table(index=['urban', 'restricted'], values=['count'], columns=['speed_bin'])

This returns:

#                  count                                     
# speed_bin        0-2.5 12.5-17.5 17.5-22.5 2.5-7.5 7.5-12.5
# urban restricted                                           
# False False         45        20         4      12       16
jpp
  • 159,742
  • 34
  • 281
  • 339