1

I am stumbling over the exact (or at least most elegant) steps to group and aggregate some data in Pandas. Let's say I have a DataFrame that looks something like this -

system    sub_system     datatype     datacount
a         1              foo          111
a         1              bar          222
a         1              baz          333
a         2              foo          444
a         2              baz          555
b         1              foo          667
b         3              baz          777

I'd like to get the sum of datacount while grouping by system and sub_system, as long as the datatype does not equal bar, and then put those totals back into the original dataframe.

If I try non_bar_totals = df[df.datatype != 'bar'].groupby(['system', 'sub_sytem']).agg(np.sum), it'll get me something like -

                        count
system    sub_system
a         1             444
a         2             999
b         1             667
b         3             777

But now I'm not sure how to push that count value back into the original DataFrame. What is the right syntax to get those counts to be pushed back into the original Dataframe? The end product should look like -

system    sub_system     datatype     datacount    non_bar_total
a         1              foo          111          444
a         1              bar          222          444
a         1              baz          333          444
a         2              foo          444          999
a         2              baz          555          999
b         1              foo          667          667
b         3              baz          777          777

Thanks, I know this is something simple, I'm just missing the right keyword to find an example of someone doing it.

Kafonek
  • 362
  • 1
  • 2
  • 10
  • Feels like you should be able to use a transform like `g.transform(lambda x: x['datacount'].where(x['datatype'] != 'bar').sum())` but not quite. – Andy Hayden Feb 28 '14 at 19:17

1 Answers1

0

You can go by using the power of apply function:

def conditional_sum(grp):
    grp['non_bar_total'] = grp[grp.datatype != 'bar']['datacount'].sum()
    return grp

df.groupby(['system', 'sub_system']).apply(conditional_sum)

system  sub_system  datatype    datacount   non_bar_total
0    a   1   foo     111     444
1    a   1   bar     222     444
2    a   1   baz     333     444
3    a   2   foo     444     999
4    a   2   baz     555     999
5    b   1   foo     667     667
6    b   3   baz     777     777
Zero
  • 74,117
  • 18
  • 147
  • 154