How to find first local maximum for every group?

Question

Suppose I have a Pandas data frame as follows:

Test Parameter Value

X1     0        0.033285423511615113
X1     1        0.78790279861666179
X1     2        0.79136989638378297
X1     3        0.80063190842016707
X1     4        0.7884653622402551
X1     5        0.78561849214309198...
...
X1     22       22: 0.82241991278171311...
...
X2 ...

I'd like to get the row with Parameter value 3. That is the row with the last increasing value before the first drop. Notice that later on we might have higher values (eg row 22). Essentially, I'm trying to get the "last" number before the "first" decrease value.

Also note that there are multiple Tests, so I probably need to do something like:

myDF.groupby("Test").Something

Do you want just the last peak or all such peaks? – cs95 Oct 22 '17 at 05:24 — cs95, Oct 22 '17 at 05:24
I misunderstood the question I think – Bharath M Shetty Oct 22 '17 at 05:30 — Bharath M Shetty, Oct 22 '17 at 05:30
Do you want the first local maxima? – Andy Hayden Oct 22 '17 at 05:33 — Andy Hayden, Oct 22 '17 at 05:33

score 7 · Answer 1 · answered Oct 22 '17 at 05:29

7

Coldspeed nearly has it, to get only the first group you can use cumprod, or similar e.g.

In [11]: df[((df.Value.diff().fillna(1) > 0).cumprod()) == 1].tail(1)
Out[11]:
  Test  Parameter     Value
3   X1          3  0.800632

The trick being:

In [12]: (df.Value.diff().fillna(1) > 0)
Out[12]:
0     True
1     True
2     True
3     True
4    False
5    False
6     True
Name: Value, dtype: bool

In [13]: (df.Value.diff().fillna(1) > 0).cumprod()
Out[13]:
0    1
1    1
2    1
3    1
4    0
5    0
6    0
Name: Value, dtype: int64

Note: My df is this:

In [21]: df
Out[21]:
  Test  Parameter     Value
0   X1          0  0.033285
1   X1          1  0.787903
2   X1          2  0.791370
3   X1          3  0.800632
4   X1          4  0.788465
5   X1          5  0.785618
6   X1         22  0.822420

answered Oct 22 '17 at 05:29

Andy Hayden

359,921
101
625
535

There's definitely a nicer way to do this, but I don't recall it. – Andy Hayden Oct 22 '17 at 05:29
Cumprod is a very nice trick that has eluded me for ever, gotta remember that. +1'd – cs95 Oct 22 '17 at 05:33
@cᴏʟᴅsᴘᴇᴇᴅ you can do a similar trick with cumsum if you can make the changes (rows that "change") True... then you can groupby the result for example, I think that's what I was thinking of/trying to recall. – Andy Hayden Oct 22 '17 at 05:36
@AndyHayden: I think this is it -- one question though, if there are multiple tests, how do I groupBy test. Would it be: df[((df.groupby("Test"). Value.diff().fillna(1) > 0).cumprod()) == 1].tail(1) – user1357015 Oct 22 '17 at 13:12

piRSquared · Accepted Answer · 2017-10-22T06:00:51.060

Use np.diff, it will naturally reduce the length of array by one and when I use np.flatnonzero it will identify the ordinal positions prior.

df.iloc[[np.flatnonzero(np.diff(df.Value) < 0)[0]]]

  Test  Parameter     Value
3   X1          3  0.800632

Note:
We can speed this up by accessing the underlying numpy array

df.iloc[[np.flatnonzero(np.diff(df.Value.values) < 0)[0]]]

Explanation

Get differences

np.diff(df.Value)

array([ 0.754618,  0.003467,  0.009262, -0.012167, -0.002847,  0.036802])

Find where differences are negative

np.flatnonzero(np.diff(df.Value) < 0)

array([3, 4])

I want the first one

np.flatnonzero(np.diff(df.Value) < 0)[0]

3

Use double brackets in an iloc

df.iloc[[3]]

  Test  Parameter     Value
3   X1          3  0.800632

The Group By Looks Like

f = lambda d: d.iloc[[np.flatnonzero(np.diff(d.Value.values) < 0)[0]]]
df.groupby('Test').apply(f)

       Test  Parameter     Value
Test                            
X1   3   X1          3  0.800632

Awesome super sir. you can also add the groupby method na. Op wants that for multiple test cases — Bharath M Shetty, Oct 22 '17 at 05:57
I mean `df.groupby('Test',as_index=False).apply(lambda x : x.iloc[np.flatnonzero(np.diff(x.Value) < 0)[0]])`. You might have better — Bharath M Shetty, Oct 22 '17 at 05:58

cs95 · Answer 3 · 2017-10-22T05:36:39.943

3

Use diff + tail:

df    
  Test  Parameter     Value
0   X1          0  0.033285
1   X1          1  0.787903
2   X1          2  0.791370
3   X1          3  0.800632
4   X1          4  0.788465
5   X1          5  0.785618

df[df.Value.diff().gt(0)].tail(1)    
  Test  Parameter     Value
3   X1          3  0.800632

This will retrieve the last local minima. If you want the first local minima, refer to Andy Hayden's solution involving cumprod.

If you're doing this in a groupby operation, it'd be something like (borrowing from Andy):

df.groupby('Test', group_keys=False)\
      .apply(lambda x: x[((x.Value.diff().fillna(1) > 0).cumprod()) == 1].tail(1))

edited Oct 22 '17 at 05:36

answered Oct 22 '17 at 05:18

cs95

379,657
97
704
746

No no no not the one. I used the same thing with shift but it has to be every increasing value – Bharath M Shetty Oct 22 '17 at 05:19
@Bharathshetty yeah, I think tail should handle that. Anyway, I'll wait for OP. – cs95 Oct 22 '17 at 05:21
There had to be two rows 3 and 22 – Bharath M Shetty Oct 22 '17 at 05:21
@Bharathshetty `Essentially, I'm trying to get the "last" number before the "first" decrease value.` So just one. – cs95 Oct 22 '17 at 05:22
1

@Bharathshetty and cᴏʟᴅsᴘᴇᴇᴅ I think OP wants the first local maxima – Andy Hayden Oct 22 '17 at 05:31

BENY · Answer 4 · 2017-10-22T06:20:47.260

3

I think max can do it ...

df.sort_values('Value', ascending=False).drop_duplicates(['Test'])
Out[226]: 
  Test  Parameter     Value
3   X1          3  0.800632

Or

df[df['Value'] == df.groupby(['Test'])['Value'].transform(max)]
Out[227]: 
  Test  Parameter     Value
3   X1          3  0.800632

Seems this is what your need ...anyway using ugly way to correct my old post~ .

df1=df.loc[(df.Value.diff().fillna(1) > 0).nonzero()[0]].reset_index()
df1.groupby(df1['index'].diff().ne(1).cumsum()).last().iloc[0,]
Out[289]: 
index               3
Test               X1
Parameter           3
Value        0.800632
Name: 1, dtype: object

For groupby

l=[]
for _,dfs in df.groupby('Test'):
    df1 = dfs.loc[(dfs.Value.diff().fillna(1) > 0).nonzero()[0]].reset_index()
    l.append(df1.groupby(df1['index'].diff().ne(1).cumsum()).last().iloc[0,].to_frame().T)


pd.concat(l,axis=0)

edited Oct 22 '17 at 06:20

answered Oct 22 '17 at 05:25

BENY

317,841
20
164
234

Nice one. +1 Max should find the global maxima. – cs95 Oct 22 '17 at 05:28
I agree. I meant to say that max is a very sensible solution! – cs95 Oct 22 '17 at 05:30
1

I think 22 has the same parameter test, so don't see how this can work :/ Maybe I misunderstand OPs question ! – Andy Hayden Oct 22 '17 at 05:33
1

`df.iloc[df.groupby('Test')['Value'].idxmax()]`? – Zero Oct 22 '17 at 05:47
@Zero nice improvement:-) – BENY Oct 22 '17 at 05:48
1

@AndyHayden I think you are right , But All I know is about `cumprod` and `scipy`(You and Bh already posted), so I made a very ugly way to achieve this ... – BENY Oct 22 '17 at 06:08

Bharath M Shetty · Answer 5 · 2017-10-22T05:50:48.710

3

Also from scipy argrelextrema we can do (From finding local maximas)

from scipy.signal import argrelextrema
maxInd = argrelextrema(df['Value'].values, np.greater)
df.iloc[maxInd[0][:1]]

Test  Parameter     Value
3   X1          3  0.800632

A groupby solution if you have a dataframe i.e


 Test  Parameter     Value
0   X1          0  0.033285
1   X1          1  0.787903
2   X1          2  0.791370
3   X1          3  0.800632
4   X1          4  0.788465
5   X2          5  0.785618
6   X2         22  0.822420
7   X2          5  0.785618

def get_maxima(x):
    return x.iloc[argrelextrema(x['Value'].values,np.greater)[0][:1]]

df.groupby('Test').apply(get_maxima)

Output :

    Test  Parameter     Value
0 3   X1          3  0.800632
1 6   X2         22  0.822420

edited Oct 22 '17 at 05:50

answered Oct 22 '17 at 05:36

Bharath M Shetty

30,075
6
57
108

1

https://docs.scipy.org/doc/scipy-0.19.1/reference/generated/scipy.signal.argrelextrema.html – BENY Oct 22 '17 at 05:39
1

I think this needs to be `df.iloc[maxInd[0][:1]]`, but very neat! – Andy Hayden Oct 22 '17 at 05:42
Also added a groupby approach for multiple maximas . Thank you sir – Bharath M Shetty Oct 22 '17 at 05:51

How to find first local maximum for every group?

5 Answers5