Unique Column Combinations to Column names in Pandas

Question

I have 14,000 dataframes with 30,000 rows each. An example df looks like this urban restricted speed_bin True False 0-2.5 True False 0-2.5 True False 0-2.5 True True 0-2.5 True False 2.5-4.5

I need to aggregate the data over all the dataframes, but can not hold it all in memory. I would like to count the unique row combinations for each dataframe. For a single dataframe, I can do

df.groupby(['urban','restricted','speed_bin']).size().reset_index() \
                                               .rename(columns={0:'count'})

urban   restricted  speed_bin   count
0   False   False   0-2.5       45
1   False   False   2.5-7.5     12
2   False   False   7.5-12.5    16
3   False   False   12.5-17.5   20
4   False   False   17.5-22.5   4

How can I transpose this and create column names out of the speed_bin combinations while making the count column the row content, and leaving urban and speed_bin columns? Keep in mind that a dataframe might not have an instance of a possible combination another dataframe might have.

So, in the end I would have four rows corresponding to the combinations of urban and restricted with the corresponding speed bins counts.

Thanks in advance!

What do you mean "and leaving urban and speed_bin columns"? That seems to be at odds with the last sentence. — Alex, Feb 09 '18 at 00:31

score 0 · Answer 1 · answered Feb 09 '18 at 00:29

df.set_index(['urban', 'restricted', 'speed_bin'])['count'].unstack()

returns

speed_bin         0-2.5  12.5-17.5  17.5-22.5  2.5-7.5  7.5-12.5
urban restricted                                                
False False          45         20          4       12        16

score 0 · Answer 2 · answered Feb 09 '18 at 00:33

One way is:

df.pivot_table(index=['urban', 'restricted'], values=['count'], columns=['speed_bin'])

This returns:

#                  count                                     
# speed_bin        0-2.5 12.5-17.5 17.5-22.5 2.5-7.5 7.5-12.5
# urban restricted                                           
# False False         45        20         4      12       16

Unique Column Combinations to Column names in Pandas

2 Answers2