0

I have this timeseries df:

                    Current
2018-09-01 00:00      -0.01
2018-09-01 00:01      -0.03
2018-09-01 00:02      -0.01
2018-09-01 00:03       0.03
2018-09-01 00:04      -0.02
2018-09-01 00:05      -0.04
2018-09-01 00:06       0.05

I am trying to find the first instance of a Current value being > 0.01. If I use

findValue = (df['Current'] > 0.01).idxmax()

I will return:

2018-09-01 00:03 0.03.

However, I would like to ignore the first 5 rows, so that the return should be

 2018-09-01 00:06       0.05

I have tried using shift():

findValue = (df['Current'] > 0.01).shift(5).idxmax()

But this doesn't seem right...

warrenfitzhenry
  • 2,209
  • 8
  • 34
  • 56

1 Answers1

1

You can use iloc for seelct all columns without first 5 by indexing:

N = 5
findValue = (df['Current'].iloc[N:] > 0.01).idxmax()
print (findValue)
2018-09-01 00:06

Another idea is create another boolean mask by np.arange and length of DataFrame and chained by &:

m1 = df['Current'] > 0.01
m2 = np.arange(len(df)) >= 5
findValue = (m1 & m2).idxmax()
print (findValue)
2018-09-01 00:06

If need select by value in DatetimeIndex:

findValue = (df['Current'].loc['2018-09-01 00:05':] > 0.01).idxmax()
print (findValue)
2018-09-01 00:06:00

m1 = df['Current'] > 0.01
m2 = df.index >= '2018-09-01 00:05'
findValue = (m1 & m2).idxmax()
print (findValue)
2018-09-01 00:06:00

BUT:

idxmax return first False value, if not match any value:

m1 = df['Current'] > 5.01
m2 = np.arange(len(df)) >= 5
findValue = (m1 & m2).idxmax()

print (findValue)
2018-09-01 00:00:00

Possible solution is use next with iter:

m1 = df['Current'] > 5.01
m2 = np.arange(len(df)) >= 5
findValue = next(iter(df.index[m1 & m2]), 'no exist')

print (findValue)
no exist

If performance is important, check this nice @jpp Q/A - Efficiently return the index of the first value satisfying condition in array.

jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252