Using multiple lambda functions with a pandas dataframe

Question

I have a pd data frame in which the column called "process_id" has, for multiple time steps, different parameters associated with it. I want to extract several information from these and put them into a new data frame (so I don't have to use all the details of the data). Below is an example of what I mean, where I keep, for each "process_id" the min, max, mean and std of each parameter and I also define a lambda function to save the mean of the parameters in the last 5 timesteps:

features = df.groupby('process_id').agg(['min', 'max', 'mean', 'std', lambda x: x.tail(5).mean()])

This works fine and the lambda function changes the name of the parameter in the table to something like this: "parameter_lambda" (not sure how, but it works). Now the problem is that if I want to add another lambda function, something like this (or any other lambda definition):

features = df.groupby('process_id').agg(['min', 'max', 'mean', 'std', lambda x: x.tail(5).mean(),lambda x: x.iloc[0:int(len(df)/5)].mean()])

I get this error:

Function names must be unique, found multiple named

Which makes sense, as both lambda functions will have the same name in the data frame. But I don't know how to get around this.

I tried something like this:

df.groupby('dummy').agg({'returns':{'Mean': np.mean, 'Sum': np.sum}})

as described here, but I am getting this error:

SpecificationError: cannot perform renaming for returns with a nested dictionary

Can someone help me? Thank you!

I'm inclined to think of this as a bug in pandas, if it accepts functions but relies on the `__name__` attribute to distinguish them. — chepner, Feb 10 '19 at 19:31

score 6 · Accepted Answer · answered Feb 10 '19 at 19:21

6

lambda function will have the problem with duplicate name errors when there are more than one para created by lambda

fuc1=lambda x: x.tail(5).mean()
fuc1.__name__ = 'tail_mean'

fuc2=lambda x: x.iloc[0:int(len(df)/5)].mean()
fuc2.__name__ = 'len_mean'

features = df.groupby('process_id').agg(['min', 'max', 'mean', 'std', fuc1,fuc2])

answered Feb 10 '19 at 19:21

BENY

317,841
20
164
234

That's great! Thanks a lot! – JohnDoe122 Feb 10 '19 at 19:33
1

@BillKet yw :-) btw if it is what you need , would like accept it ? – BENY Feb 10 '19 at 19:33

score 0 · Answer 2 · answered Feb 10 '19 at 19:16

0

features = df.groupby('process_id').agg(['min', 'max', 'mean', 'std', lambda x: x.tail(5).mean(),lambda y: y.iloc[0:int(len(df)/5)].mean()])

Try with x and y instead of x and x

df.groupby('dummy').agg({'returns': [np.mean, np.sum]})

Also, try this

answered Feb 10 '19 at 19:16

ycx

3,155
3
14
26

Thank you, but I am still getting the same error. The name comes purely from "lambda". It doesn't contain the variable in it. – JohnDoe122 Feb 10 '19 at 19:23
I think @Wen-Ben 's answer might be it. I learnt something new too – ycx Feb 10 '19 at 19:25

Using multiple lambda functions with a pandas dataframe

2 Answers2