0

I have a Pandas dataset with 3 columns. I need to group by the ID column while finding the sum and count of the other two columns. Also, I have to ignore the zeroes in the columsn 'A' and 'B'.

The dataset looks like -

ID   A    B  

1    0    5  
2   10    0   
2   20    0  
3    0   30   

What I need -

ID   A_Count   A_Sum   B_Count   B_Sum  

1      0         0        1        5  
2      2        30        0        0  
3      0         0        1       30  

I have tried this using one column but wasn't able to get both the aggregations in the final dataset.

(df.groupby('ID').agg({'A':'sum', 'A':'count'}).reset_index().rename(columns = {'A':'A_sum', 'A': 'A_count'}))  
Rahul Agarwal
  • 4,034
  • 7
  • 27
  • 51
Vjain
  • 13
  • 2
  • Probable Duplicate : https://stackoverflow.com/questions/38174155/group-dataframe-and-get-sum-and-count – min2bro May 02 '19 at 09:13
  • Possible duplicate of [Group dataframe and get sum AND count?](https://stackoverflow.com/questions/38174155/group-dataframe-and-get-sum-and-count) – Georgy May 02 '19 at 09:57

1 Answers1

1

If you don't pass it columns specifically, it will aggregate the numeric columns by itself.

Since your don't want to count 0, replace them with NaN first:

df.replace(0, np.NaN, inplace=True)

print(df)
   ID     A     B
0   1   NaN   5.0
1   2  10.0   NaN
2   2  20.0   NaN
3   3   NaN  30.0

df = df.groupby('ID').agg(['count', 'sum'])

print(df)
       A           B      
   count   sum count   sum
ID                        
1      0   0.0     1   5.0
2      2  30.0     0   0.0
3      0   0.0     1  30.0

Remove MultiIndex columns

You can use list comprehension:

df.columns = ['_'.join(col) for col in df.columns]

print(df)
    A_count  A_sum  B_count  B_sum
ID                                
1         0    0.0        1    5.0
2         2   30.0        0    0.0
3         0    0.0        1   30.0
Erfan
  • 40,971
  • 8
  • 66
  • 78