49

I have a pandas dataframe below:

    df

    name    value1    value2  otherstuff1 otherstuff2 
0   Jack       1         1       1.19        2.39     
1   Jack       1         2       1.19        2.39
2   Luke       0         1       1.08        1.08  
3   Mark       0         1       3.45        3.45
4   Luke       1         0       1.08        1.08

Same name will have the same value for otherstuff1 and otherstuff2.

I'm trying to group by column name and sum both columns value1 and value2. (Not sum value1 with value2!!! But sum them individually in each column.)

Expecting to get result below:

    newdf

    name    value1    value2  otherstuff1 otherstuff2 
0   Jack       2         3       1.19        2.39     
1   Luke       1         1       1.08        1.08  
2   Mark       0         1       3.45        3.45

I've tried

newdf = df.groupby(['name'], as_index=False).sum()

which groups by name and sums up both value1 and value2 columns correctly, but ends up dropping columns otherstuff1 and otherstuff2.

wjandrea
  • 28,235
  • 9
  • 60
  • 81
SwagZ
  • 759
  • 1
  • 9
  • 16

4 Answers4

70

You should specify what pandas must do with the other columns. In your case, I think you want to keep one row, regardless of its position within the group.

This could be done with agg on a group. agg accepts a parameter that specifies what operation should be performed for each column.

df.groupby(['name'], as_index=False).agg({'value1': 'sum', 'value2': 'sum', 'otherstuff1': 'first', 'otherstuff2': 'first'})
Guybrush
  • 2,680
  • 1
  • 10
  • 17
  • 12
    what if I have a lot of otherstuff columns? Should I loop through all other columns and create a dictionary ? – SwagZ Apr 11 '18 at 19:45
  • 2
    You can first compute the sum on the columns you want by dropping the other ones, then merge the resulting dataframe with the old one on its index. – Guybrush Apr 11 '18 at 20:38
  • 1
    @Guybrush but the merging shouldn't be working, because the new and old df have different lengths, right? Can you provide example code for how this would work? – NeStack Mar 31 '20 at 09:51
  • @NeStack They will have different lengths, but that's the purpose of aggregating rows, isn't it? If there are other columns that contain "useful information", then either these information are the same for all rows in a given group (and in that case, an inner-join will do the job) or they are different (and in that case, a left-join will do the job). – Guybrush Apr 01 '20 at 13:56
  • why is not works with count? Got all zeros in counts column. – Peter.k Feb 20 '23 at 17:23
23

Something like ?(Assuming you have same otherstuff1 and otherstuff2 under the same name )

df.groupby(['name','otherstuff1','otherstuff2'],as_index=False).sum()
Out[121]: 
   name  otherstuff1  otherstuff2  value1  value2
0  Jack         1.19         2.39       2       3
1  Luke         1.08         1.08       1       1
2  Mark         3.45         3.45       0       1
BENY
  • 317,841
  • 20
  • 164
  • 234
  • Thx for the quick response. But it's possible different name share the same value for otherstuff1 and otherstuff2. – SwagZ Apr 11 '18 at 19:41
  • 1
    @SwagZ still the combination (group key ) will help it, So you mean , same name with have multiple otherstuff1 and otherstuff2? – BENY Apr 11 '18 at 19:41
  • Not sure if I follow you. Could you be more specific and provide more detail. Thx in advance. – SwagZ Apr 11 '18 at 19:43
  • Different name with same value for otherstuff1 and otherstuff2. Example being: Luke has value1 1.08 and value2 1.08. There is a new Zack who also has value1 1.08 and value2 1.08. – SwagZ Apr 11 '18 at 19:47
  • @SwagZ then .....The name is different right ? name + otherstuff1 + otherstuff2 will be unique combination , Luke 1.08 1.08. no equal to Zack 1.08 1.08, so it will not belong to one group ...I will recommend check the official document for groupby . – BENY Apr 11 '18 at 19:49
  • I think I misunderstood the meaning of how groupby use combination(group key). I thought groupby uses each key inside the group to groupby instead of using the whole combination. Finally got it figured out now. Thank you so much! – SwagZ Apr 11 '18 at 19:51
4

The key in the answer above is actually the as_index=False, otherwise all the columns in the list get used in the index.

p_summ = p.groupby( attributes_list, as_index=False ).agg( {'AMT':sum })
ggorlen
  • 44,755
  • 7
  • 76
  • 106
Graven74
  • 49
  • 3
3

These solutions are great, but when you have too many columns, you do not want to type all of the column names. So here is what I came up with:

column_map = {col: "first" for col in df.columns}
column_map["col_name1"] = "sum"
column_map["col_name2"] = lambda x: set(x) # it can also be a function or lambda

now you can simply do

df.groupby(["col_to_group"], as_index=False).aggreagate(column_map)
Berkay Berabi
  • 1,933
  • 1
  • 10
  • 26