Pandas Groupby and Sum Only One Column

Question

So I have a dataframe, df1, that looks like the following:

       A      B      C
1     foo    12    California
2     foo    22    California
3     bar    8     Rhode Island
4     bar    32    Rhode Island
5     baz    15    Ohio
6     baz    26    Ohio

I want to group by column A and then sum column B while keeping the value in column C. Something like this:

      A       B      C
1    foo     34    California
2    bar     40    Rhode Island
3    baz     41    Ohio

The issue is, when I say

df.groupby('A').sum()

column C gets removed, returning

      B
A
bar  40
baz  41
foo  34

How can I get around this and keep column C when I group and sum?

Can you just `groupby` A and C? If every value of A doesn't map 1 to 1 to a value of C, then what you're asking isn't possible. If they do map 1 to 1, it should be no trouble to `groupby` both — Kyle Heuton, Aug 16 '16 at 21:54
Yea got it, I had been trying to do multiple values but hadn't been using the proper format which caused me to think I couldn't use multiple values. Thanks! — JSolomonCulp, Aug 16 '16 at 21:59

score 140 · Accepted Answer · edited Aug 05 '21 at 07:30

140

The only way to do this would be to include C in your groupby (the groupby function can accept a list).

Give this a try:

df.groupby(['A','C'])['B'].sum()

One other thing to note, if you need to work with df after the aggregation you can also use the as_index=False option to return a dataframe object. This one gave me problems when I was first working with Pandas. Example:

df.groupby(['A','C'], as_index=False)['B'].sum()

edited Aug 05 '21 at 07:30

ah bon

9,293
12
65
148

answered Aug 16 '16 at 21:58

Sevyns

2,992
5
19
23

1

Yup, I hadn't realized I needed the [] which made me think you couldn't group multiple columns. Thanks! – JSolomonCulp Aug 16 '16 at 22:00
Glad to help! If you could accept the answer (green check) I'd greatly appreciate it. Best of luck! – Sevyns Aug 16 '16 at 22:01
Maybe we should add the comment that if we want to export this and keep the headers we need to add this line in the end: `df.to_csv("output.csv", header=True, index=True)` – Datacrawler Apr 21 '18 at 11:08

score 21 · Answer 2 · edited Dec 16 '21 at 01:01

21

If you don't care what's in your column C and just want the nth value, you could just do this:

df.groupby('A').agg({'B' : 'sum',
                     'C' : lambda x: x.iloc[n]})

edited Dec 16 '21 at 01:01

ah bon

9,293
12
65
148

answered Aug 16 '16 at 22:02

Kartik

8,347
39
73

Getting error on this line: 'c' : lambda x: x.iloc[n]} . the error is : NameError: name 'n' is not defined – S4nd33p May 10 '21 at 10:20
n is just an example, you can put any integer you like there. – Eliyahu Aug 30 '21 at 16:50

score 10 · Answer 3 · answered Apr 02 '22 at 03:34

Another option is to use groupby.agg and use the first method on column "C".

out = df.groupby('A', as_index=False, sort=False).agg({'B':'sum', 'C':'first'})

Output:

     A   B             C
0  foo  34    California
1  bar  40  Rhode Island
2  baz  41          Ohio

Pandas Groupby and Sum Only One Column

3 Answers3

Linked

Related