0

I have a data frame with the following data

date       type       value
1/1/2016   a          1
1/1/2016   b          2
1/1/2016   a          1
1/1/2016   b          4
1/2/2016   a          1
1/2/2016   b          1

I'm trying to figure out how to create conditional aggregators after grouping by date, to perform an operation both across the entire day as well as for a particular type, with output as below

date       sum_value    avg_value    a_sum     a_max    b_sum    b_max
1/1/2016   8            2            2         1        6        4     
1/2/2016   2            1            1         1        1        1
cs95
  • 379,657
  • 97
  • 704
  • 746
James Li
  • 85
  • 1
  • 1
  • 3

1 Answers1

1

You'll need to handle this in 2 steps. The first step is the get mean and sum through a df.groupby call on date only.

x = df.groupby('date').value.agg(['mean', 'sum'])
print(x)

          mean  sum
date               
1/1/2016     2    8
1/2/2016     1    2

The next 4 columns need a groupby on date and type followed by an unstacking.

y = df.groupby(['date', 'type']).value.agg(['sum', 'max']).unstack()
y.columns = ['_'.join(col) for col in x.columns.values] # https://stackoverflow.com/a/45878539/4909087
print(y)

          sum_a  sum_b  max_a  max_b
date                                
1/1/2016      2      6      1      4
1/2/2016      1      1      1      1

Now, concatenate x and y.

res = pd.concat([x, y], 1).reset_index()
print(res) 
       date  mean  sum  sum_a  sum_b  max_a  max_b
0  1/1/2016     2    8      2      6      1      4
1  1/2/2016     1    2      1      1      1      1
cs95
  • 379,657
  • 97
  • 704
  • 746
  • 1
    unstacking, that's the part I was missing, thank you!!!!! that's what happens when you skim a quarter of the way through the pandas docs I guess lol. – James Li Aug 25 '17 at 12:39