3

I have a pd data frame in which the column called "process_id" has, for multiple time steps, different parameters associated with it. I want to extract several information from these and put them into a new data frame (so I don't have to use all the details of the data). Below is an example of what I mean, where I keep, for each "process_id" the min, max, mean and std of each parameter and I also define a lambda function to save the mean of the parameters in the last 5 timesteps:

features = df.groupby('process_id').agg(['min', 'max', 'mean', 'std', lambda x: x.tail(5).mean()])

This works fine and the lambda function changes the name of the parameter in the table to something like this: "parameter_lambda" (not sure how, but it works). Now the problem is that if I want to add another lambda function, something like this (or any other lambda definition):

features = df.groupby('process_id').agg(['min', 'max', 'mean', 'std', lambda x: x.tail(5).mean(),lambda x: x.iloc[0:int(len(df)/5)].mean()])

I get this error:

Function names must be unique, found multiple named

Which makes sense, as both lambda functions will have the same name in the data frame. But I don't know how to get around this.

I tried something like this:

df.groupby('dummy').agg({'returns':{'Mean': np.mean, 'Sum': np.sum}})

as described here, but I am getting this error:

SpecificationError: cannot perform renaming for returns with a nested dictionary

Can someone help me? Thank you!

JohnDoe122
  • 638
  • 9
  • 23

2 Answers2

6

lambda function will have the problem with duplicate name errors when there are more than one para created by lambda

fuc1=lambda x: x.tail(5).mean()
fuc1.__name__ = 'tail_mean'

fuc2=lambda x: x.iloc[0:int(len(df)/5)].mean()
fuc2.__name__ = 'len_mean'

features = df.groupby('process_id').agg(['min', 'max', 'mean', 'std', fuc1,fuc2])
BENY
  • 317,841
  • 20
  • 164
  • 234
0
features = df.groupby('process_id').agg(['min', 'max', 'mean', 'std', lambda x: x.tail(5).mean(),lambda y: y.iloc[0:int(len(df)/5)].mean()])

Try with x and y instead of x and x

df.groupby('dummy').agg({'returns': [np.mean, np.sum]})

Also, try this

ycx
  • 3,155
  • 3
  • 14
  • 26
  • Thank you, but I am still getting the same error. The name comes purely from "lambda". It doesn't contain the variable in it. – JohnDoe122 Feb 10 '19 at 19:23
  • I think @Wen-Ben 's answer might be it. I learnt something new too – ycx Feb 10 '19 at 19:25