I have a pd data frame in which the column called "process_id" has, for multiple time steps, different parameters associated with it. I want to extract several information from these and put them into a new data frame (so I don't have to use all the details of the data). Below is an example of what I mean, where I keep, for each "process_id" the min, max, mean and std of each parameter and I also define a lambda function to save the mean of the parameters in the last 5 timesteps:
features = df.groupby('process_id').agg(['min', 'max', 'mean', 'std', lambda x: x.tail(5).mean()])
This works fine and the lambda function changes the name of the parameter in the table to something like this: "parameter_lambda" (not sure how, but it works). Now the problem is that if I want to add another lambda function, something like this (or any other lambda definition):
features = df.groupby('process_id').agg(['min', 'max', 'mean', 'std', lambda x: x.tail(5).mean(),lambda x: x.iloc[0:int(len(df)/5)].mean()])
I get this error:
Function names must be unique, found multiple named
Which makes sense, as both lambda functions will have the same name in the data frame. But I don't know how to get around this.
I tried something like this:
df.groupby('dummy').agg({'returns':{'Mean': np.mean, 'Sum': np.sum}})
as described here, but I am getting this error:
SpecificationError: cannot perform renaming for returns with a nested dictionary
Can someone help me? Thank you!