2

I have a data frame with a list of processes and the time they took as follows

enter image description here

I would like to get the following result

enter image description here

I know how to use gorupby in order to get ONE but only one of those columns. And this is the way I solve the problem

# the data
ps    = ['p1','p2','p3','p4','p2','p2','p3','p6','p2','p4','p5','p6']
times = [20,10,2,3,4,5,6,3,4,5,6,7]
processes = pd.DataFrame({'ps':ps,'time':times})

# the series
dfsum   = processes.groupby('PROCESS')['TIME'].sum()
dfcount = processes.groupby('PROCESS')['TIME'].count()

# "building" the df result
frame = { 'total_time': dfsum, 'total_nr': dfcount} 
dfresult = pd.DataFrame(frame)
dfresult['average']= dfresult['total_time']/dfresult['total_nr']
dfresult

but how to get the desired df without having to compose it column by column? For me this method is not "pandonic" enough (nor pythonic)

JFerro
  • 3,203
  • 7
  • 35
  • 88
  • 1
    Have a look at [named aggregation](https://pandas.pydata.org/pandas-docs/stable/user_guide/groupby.html#named-aggregation): ``processes.groupby("ps").agg(total_time=("time", "sum"),avg_time=("time", "size"))`` – sammywemmy Jan 18 '21 at 20:46

2 Answers2

3

Try groupby.agg():

df.groupby('PROCESS')['TIME'].agg(['sum','mean','count'])

Output for the sample data:

    sum   mean  count
ps                   
p1   20  20.00      1
p2   23   5.75      4
p3    8   4.00      2
p4    8   4.00      2
p5    6   6.00      1
p6   10   5.00      2
Quang Hoang
  • 146,074
  • 10
  • 56
  • 74
3
processes.groupby('ps').agg(TOTAL_TIME=('time','sum'),AVERAGE=('time','mean'),NRTIMES=('time','size'))
creanion
  • 2,319
  • 2
  • 13
  • 17
wwnde
  • 26,119
  • 6
  • 18
  • 32