8

Suppose I have a Pandas data frame as follows:

Test Parameter Value

X1     0        0.033285423511615113
X1     1        0.78790279861666179
X1     2        0.79136989638378297
X1     3        0.80063190842016707
X1     4        0.7884653622402551
X1     5        0.78561849214309198...
...
X1     22       22: 0.82241991278171311...
...
X2 ...

I'd like to get the row with Parameter value 3. That is the row with the last increasing value before the first drop. Notice that later on we might have higher values (eg row 22). Essentially, I'm trying to get the "last" number before the "first" decrease value.

Also note that there are multiple Tests, so I probably need to do something like:

myDF.groupby("Test").Something
Bernhard Barker
  • 54,589
  • 14
  • 104
  • 138
user1357015
  • 11,168
  • 22
  • 66
  • 111

5 Answers5

7

Coldspeed nearly has it, to get only the first group you can use cumprod, or similar e.g.

In [11]: df[((df.Value.diff().fillna(1) > 0).cumprod()) == 1].tail(1)
Out[11]:
  Test  Parameter     Value
3   X1          3  0.800632

The trick being:

In [12]: (df.Value.diff().fillna(1) > 0)
Out[12]:
0     True
1     True
2     True
3     True
4    False
5    False
6     True
Name: Value, dtype: bool

In [13]: (df.Value.diff().fillna(1) > 0).cumprod()
Out[13]:
0    1
1    1
2    1
3    1
4    0
5    0
6    0
Name: Value, dtype: int64

Note: My df is this:

In [21]: df
Out[21]:
  Test  Parameter     Value
0   X1          0  0.033285
1   X1          1  0.787903
2   X1          2  0.791370
3   X1          3  0.800632
4   X1          4  0.788465
5   X1          5  0.785618
6   X1         22  0.822420
Andy Hayden
  • 359,921
  • 101
  • 625
  • 535
  • There's definitely a nicer way to do this, but I don't recall it. – Andy Hayden Oct 22 '17 at 05:29
  • Cumprod is a very nice trick that has eluded me for ever, gotta remember that. +1'd – cs95 Oct 22 '17 at 05:33
  • @cᴏʟᴅsᴘᴇᴇᴅ you can do a similar trick with cumsum if you can make the changes (rows that "change") True... then you can groupby the result for example, I think that's what I was thinking of/trying to recall. – Andy Hayden Oct 22 '17 at 05:36
  • @AndyHayden: I think this is it -- one question though, if there are multiple tests, how do I groupBy test. Would it be: df[((df.groupby("Test"). Value.diff().fillna(1) > 0).cumprod()) == 1].tail(1) – user1357015 Oct 22 '17 at 13:12
6

Use np.diff, it will naturally reduce the length of array by one and when I use np.flatnonzero it will identify the ordinal positions prior.

df.iloc[[np.flatnonzero(np.diff(df.Value) < 0)[0]]]

  Test  Parameter     Value
3   X1          3  0.800632

Note:
We can speed this up by accessing the underlying numpy array

df.iloc[[np.flatnonzero(np.diff(df.Value.values) < 0)[0]]]

Explanation

Get differences

np.diff(df.Value)

array([ 0.754618,  0.003467,  0.009262, -0.012167, -0.002847,  0.036802])

Find where differences are negative

np.flatnonzero(np.diff(df.Value) < 0)

array([3, 4])

I want the first one

np.flatnonzero(np.diff(df.Value) < 0)[0]

3

Use double brackets in an iloc

df.iloc[[3]]

  Test  Parameter     Value
3   X1          3  0.800632

The Group By Looks Like

f = lambda d: d.iloc[[np.flatnonzero(np.diff(d.Value.values) < 0)[0]]]
df.groupby('Test').apply(f)

       Test  Parameter     Value
Test                            
X1   3   X1          3  0.800632
piRSquared
  • 285,575
  • 57
  • 475
  • 624
3

Use diff + tail:

df    
  Test  Parameter     Value
0   X1          0  0.033285
1   X1          1  0.787903
2   X1          2  0.791370
3   X1          3  0.800632
4   X1          4  0.788465
5   X1          5  0.785618

df[df.Value.diff().gt(0)].tail(1)    
  Test  Parameter     Value
3   X1          3  0.800632

This will retrieve the last local minima. If you want the first local minima, refer to Andy Hayden's solution involving cumprod.


If you're doing this in a groupby operation, it'd be something like (borrowing from Andy):

df.groupby('Test', group_keys=False)\
      .apply(lambda x: x[((x.Value.diff().fillna(1) > 0).cumprod()) == 1].tail(1))
cs95
  • 379,657
  • 97
  • 704
  • 746
3

I think max can do it ...

df.sort_values('Value', ascending=False).drop_duplicates(['Test'])
Out[226]: 
  Test  Parameter     Value
3   X1          3  0.800632

Or

df[df['Value'] == df.groupby(['Test'])['Value'].transform(max)]
Out[227]: 
  Test  Parameter     Value
3   X1          3  0.800632

Seems this is what your need ...anyway using ugly way to correct my old post~ .

df1=df.loc[(df.Value.diff().fillna(1) > 0).nonzero()[0]].reset_index()
df1.groupby(df1['index'].diff().ne(1).cumsum()).last().iloc[0,]
Out[289]: 
index               3
Test               X1
Parameter           3
Value        0.800632
Name: 1, dtype: object

For groupby

l=[]
for _,dfs in df.groupby('Test'):
    df1 = dfs.loc[(dfs.Value.diff().fillna(1) > 0).nonzero()[0]].reset_index()
    l.append(df1.groupby(df1['index'].diff().ne(1).cumsum()).last().iloc[0,].to_frame().T)


pd.concat(l,axis=0)
BENY
  • 317,841
  • 20
  • 164
  • 234
3

Also from scipy argrelextrema we can do (From finding local maximas)

from scipy.signal import argrelextrema
maxInd = argrelextrema(df['Value'].values, np.greater)
df.iloc[maxInd[0][:1]]
Test  Parameter     Value
3   X1          3  0.800632

A groupby solution if you have a dataframe i.e


 Test  Parameter     Value
0   X1          0  0.033285
1   X1          1  0.787903
2   X1          2  0.791370
3   X1          3  0.800632
4   X1          4  0.788465
5   X2          5  0.785618
6   X2         22  0.822420
7   X2          5  0.785618
def get_maxima(x):
    return x.iloc[argrelextrema(x['Value'].values,np.greater)[0][:1]]

df.groupby('Test').apply(get_maxima)

Output :

    Test  Parameter     Value
0 3   X1          3  0.800632
1 6   X2         22  0.822420
Bharath M Shetty
  • 30,075
  • 6
  • 57
  • 108