So if I understand you correctly, you're looking for the direction of the first deviation that is larger than a
in one direction or b
in the other (you said they could be variable).
Since the values are 1 and 0, which are essentially the same as TRUE and FALSE, you're saying you want a series where TRUE means the value increases by a
before it decreases by b
, FALSE otherwise.
Note that when iterating over a series like this, this is almost always faster to keep it as a series than it is to cast it to a list. And lambdas and def functions are equivalent, apart from syntax. Your real speed ups come either from vectorization, or from being smarter about what you're doing.
In this case, what you're doing is much more than you'd need to. You're calculating the minimum and maximum value multiple times, even though they'll probably stay the same. You're also not going it by using the optimized methods for it, but with a custom function. Just use se.min(), se.max()
and compare x
directly.
In addition, you're checking the index of the first element where something is true, but you're looking at the entire series instead of quitting as soon as you find the first match. You're essentially doing a lot of unnecessary extra work!
Finally, you're looking at the array from the start, instead of from x
onwards, is that on purpose? I assume it isn't, because of the example you gave.
In any case, this answer pointed me towards np.argmax
, which returns the index of the first element where something is true if the passed numpy array contains only booleans. The only problem is that it returns 0 if no such value exists.
If we know the index, you could do two things: calculate the index for the lower and upper bound and see which occurs first, or calculate the index in an OR loop and then access the array to see which one was right. Performance-wise, I think they should be similar, but it depends how far you want to go!
import numpy as np
import pandas as pd
lower = 1; upper = 1
se = pd.Series([3550.1, 3551.2, 3550.7, 3551.3, 3550.2, 3549.0, 3549.3, 3548.7, 3549.8, 3545.4])
# I put this in a function to prevent polluting the global scope with `counter`
def calculate(se):
counter = 0
se_values = se.values
se_len = len(se) # otherwise you get an error if `np.argmax` is applied to an empty series
def goes_up_before_down(x):
nonlocal counter
counter += 1
if counter == se_len: return False
next_values = se_values[counter:]
idx_first_larger_value = np.argmax(next_values >= x + upper)
idx_first_smaller_value = np.argmax(next_values <= x - lower)
if idx_first_larger_value == 0 and x + upper >= next_values.max():
# There is no larger number in the rest of the series
# we can only go down from here
return False
if idx_first_smaller_value == 0 and x - lower <= next_values.min():
# There is no smaller number in the rest of the series
# we can only go down from here
return True
return idx_first_larger_value < idx_first_smaller_value
return se.apply(goes_up_before_down)
and this yields:
Out[2]:
0 True
1 False
2 False
3 False
4 False
5 False
6 False
7 True
8 False
9 False
dtype: bool
which can be cast into
calculate(se).astype(np.int64)
Out[3]:
0 1
1 0
2 0
3 0
4 0
5 0
6 0
7 1
8 0
9 0
dtype: int64