0

How to find the first of several minimum values in a dataset? I want to eventually find values that are at least 2 greater than the minimum value, sequentially.

For example,

import pandas as pd
import numpy as np
df = pd.DataFrame({'ID': [1,1,1,1,1,1,1], 'value': [0.6, 1.5, 1.6, 1.2, 2.8, 0.3, 0.2]})

I would like to identify df['value'][0], or simply (0.6), as the first minimum in this array. Then identify df['value'][4], or (2.8), as the value at least 2 greater than the first identified minimum (0.6).

df = pd.DataFrame({'ID': [1,1,1,1,1,1,1], 'value': [0.6, 1.5, 1.6, 1.2, 2.8, 0.3, 0.2]})
df['loc_min'] = df.value[(df.value.shift(1) >= df.value) & (df.value.shift(-1) >= df.value)]
df['loc_min']= df.groupby(['ID'], sort=False)['loc_min'].apply(lambda x: x.ffill()) 
df['condition'] = (df['value'] >= df['loc_min'] + 2)

This works for other datasets but not when the minimums are first.

The ideal output would be:

    ID  value loc_min condition
0   1   0.6   nan     False
1   1   1.5   0.6     False
2   1   1.6   0.6     False
3   1   1.2   0.6     False
4   1   2.8   0.6     True
5   1   0.3   0.3     False
6   1   0.2   0.2     False

As suggested in a comment, a loop would be a better way to go about this.

Ramy Saad
  • 49
  • 10
  • Are you asking how to find local minima in a 1D array? If so, is one of the answers to [this question](https://stackoverflow.com/questions/4624970/) (or one of the others linked from there) what you're looking for? – abarnert Aug 19 '18 at 23:44
  • Please add in your expected output to make it clear what it is you want. – cs95 Aug 19 '18 at 23:45
  • I should point out that in general, in Numpy, you don't usually find "the first of…", you find "all of…" (maybe ever in parallel), and then just use the first one or vectorize (or sometimes iterate over) all of them. So, if short-circuiting at the first one is important for correctness, or is expected to give you more performance gain than vectorizing does, you may need to loop. – abarnert Aug 19 '18 at 23:45
  • Can you explain why the first value is NaN? Also, what if the array is [1.5, 0.6, ...]? Where 0.6 is the second element? – cs95 Aug 19 '18 at 23:55
  • @abarnert thank you for your input & i've updated my question accordingly. Unfortunately, the working data is not a 1d array, but a large dataset. – Ramy Saad Aug 19 '18 at 23:57
  • @coldspeed, because there isn't a value before it to compare it with. if it was the second element than it would have been detected as a minimum as 1.5 is > 0.6. – Ramy Saad Aug 19 '18 at 23:59
  • 0.3 is detected to be smaller than 0.6, and 0.2 is smaller than 0.3. So Why isn't the entire row 0.2? I'm not clear on the thought process here. – cs95 Aug 20 '18 at 00:01
  • If it’s not a 1D array, see the links on that question about multi-dimensional arrays. Are any of _them_ what you want? – abarnert Aug 20 '18 at 00:18

1 Answers1

1

Seems like you need cummin and a simple loc

df['cummin_'] = df.groupby('ID').value.cummin()
df['condition'] = df.value >= df.cummin_ + 2


    ID  value   cummin_ condition
0   1   0.6     0.6     False
1   1   1.5     0.6     False
2   1   1.6     0.6     False
3   1   1.2     0.6     False
4   1   2.8     0.6     True
5   1   0.3     0.3     False
6   1   0.2     0.2     False

Another option is to use expanding. Take, for example,

df = pd.DataFrame({'ID': [1,1,1,1,1,1,1,2,2], 'value': [0.6, 1.5, 1.6, 1.2, 2.8, 0.3, 0.2,0.4,2.9]})

Then

df.groupby('ID').value.expanding(2).min()

    ID   
1   0    NaN
    1    0.6
    2    0.6
    3    0.6
    4    0.6
    5    0.3
    6    0.2
2   7    NaN
    8    0.4

The expanding function yields your NaNs at first while cummin accounts for the first value. Just a matter of understanding how you want results to be interpreted.

rafaelc
  • 57,686
  • 15
  • 58
  • 82