0

I have a Multi-index dataframe with multiple test result values. For further data analysis I want to add the derivation to the dataframe.

I tried to either calculate it via a lambda function directly after I grouped the dataframe. Grouping (mean values) is required due to the noise in the sampling. Later I want to delete the rows from my dataframes where the derivative is <= 0.

The simplified Multi-index dataframe looks like this:

arrays = [['LS13', 'LS13', 'LS13', 'LS13','LS14','LS14','LS14','LS14','LS14','LS14','LS14','LS14'],[0, 2, 2.5, 3,0,2,5,5.5,6,6.5,7,7.5]]
index = pd.MultiIndex.from_arrays(arrays, names=('File', 'Flow Rate Setpoint [l/s]'))
df = pd.DataFrame({('Flow Rate [l/s]','mean') : [-0.057,2.089,2.496,3.011,0.056,2.070,4.995,5.519,6.011,6.511,7.030,7.499],('Time [s]','mean') : [42.225,104.909,165.676,226.446,42.225,104.918,469.560,530.328,591.100,651.864,712.660,773.034],('Shear Stress [Pa]','mean') : [-0.698,5.621,7.946,11.278,-0.774,6.557,40.610,48.370,54.685,58.414,58.356,56.254]},index=index)

if I run my code:

import numpy as np

xls = ['LS13', 'LS14']

gradient = [pd.Series(np.gradient(df.loc[(i),('Shear Stress [Pa]','mean')],df.loc[(i),('Time [s]','mean')])) for i in xls]

now I want to concat gradient to df on axis = 1, Title could be df['Gradient''values'].

So my pd.Series looks like:

    Gradient
     values
                
0   0.100808
1   0.069048
2   0.04654
3   0.054801
0   0.116941
1   0.087431
2   0.149521
3   0.115805
4   0.082639
5   0.030213
6   -0.017938
7   -0.034806

next step would be to remove/drop the rows where ['Gradient','values'] <= 0, in my example ['LS14','7':'7.5']

When I tried to concatenate both Dataframe df and Series gradient (I'm aware that the indexes are different)

merged = pd.concat([pd.DataFrame(df),pd.Series(gradient)], axis=1 , ignore_index = True)

Errors are usually one of the following:

ValueError: Buffer dtype mismatch, expected 'Python object' but got 'long long'

TypeError: cannot concatenate object of type "<class 'list'>"; only pd.Series, pd.DataFrame, and pd.Panel (deprecated) objs are valid

I would also assume there is an easier way to get this done with an lambda function and just apply it in place.

merged = pd.concat([df, pd.Series([gradient], name=('Gradient','value'))], axis=1)

I would have expected that to work, but I also get a miss match error:

ValueError: Buffer dtype mismatch, expected 'Python object' but got 'long long'

when I try:

df[("Gradient","value")] =pd.Series([pd.Series(np.gradient(df.loc[(i),('Shear Stress [Pa]','mean')],df.loc[(i),('Time [s]','mean')])) for i in xls])

The 'Gradient','value' column gets correctly added to the dataframe but the values are again NaN.

Red
  • 26,798
  • 7
  • 36
  • 58

1 Answers1

0

You can try groupby().apply():

def get_gradients(x):
    gradients = np.gradient(x[('Shear Stress [Pa]', 'mean')],x[('Time [s]', 'mean')] )
    return pd.Series(gradients, index=x.index)

df[('Gradient','Value')] = (df.groupby('File', group_keys=False)
                              .apply(get_gradients)
                           )
Quang Hoang
  • 146,074
  • 10
  • 56
  • 74
  • it worked, but I don't really understand why I have to group it again – Otto von Hintenwievonvorn Jun 20 '19 at 16:53
  • You only group it once as in my solution. The reason your solution didn't work is that `df` has different index than `pd.Series(gradient)`. That also is the reason why I had to specify `index=x.index` in my solution. – Quang Hoang Jun 20 '19 at 17:03
  • Thanks for the explanation. Another thing I'm still struggling with which is probably based on the same issue is, that I want to ' Zero ' my measurement results based on the first row of data for each group 'File'. ```for x in xls: df[x,('Shear Stress zero [Pa]')] = pd.DataFrame(df.loc[x,('Shear Stress [Pa]')].values[:] - df.loc[x,('Shear Stress [Pa]')].values[0])``` Works partially but just gives me an array which index is off. I tried to follow your scheme to accomplish it but I get ```KeyError: 'File'``` – Otto von Hintenwievonvorn Jun 21 '19 at 13:01
  • based on your solution earlier I used: ```def set_zero(x): zeroShear = x[('Shear Stress [Pa]')] - x[('Shear Stress [Pa]')].values[0] return pd.Series(zeroShear, index = x.index) df_full[('Shear Stress zero [Pa]')] = (df_full.groupby('File', group_keys=False) .apply(set_zero))``` – Otto von Hintenwievonvorn Jun 21 '19 at 13:07