I have the following dataframe:
Price, Volume
100, 45656
101, 67563
103, 755
...
...
6543, 67567
6544, 7654
Each entry in the Price column is unique and there are several thousand rows. The goal is to identify the low volume prices over a rolling range of rows. In other words im not trying to identify the lowest volume in the whole dataframe. Im identifying many low volume rows over a rolling 'window'.
Lets say I set the rolling window to 50. What I do is then compare the current volume value to the 50 volume values above, and the 50 volume values below it. If the current volume value is the lowest within that range, I save the corresponding price to a separate list. I then move one row down and once again compare to see if the current volume value is smaller than the 50 above and below it.
My code below works correctly to accomplish this task:
rolling_window = 50
total_rows = len(df.index)
current_row = rolling_window
smallest_values = []
while current_row < total_rows - rolling_window:
is_smallest = True
for comparison_row in range(rolling_window):
if vp.iloc[current_row]['Volume'] > vp.iloc[current_row - comparison_row]['Volume'] or \
vp.iloc[current_row]['Volume'] > vp.iloc[current_row + comparison_row]['Volume']:
is_smallest = False
break
if is_smallest and vp.iloc[current_row]['Price'] not in smallest_values:
smallest_values.append(vp.iloc[current_row]['Price'])
current_row += 1
print(smallest_prices)
My problem is that it is extremely slow when dealing with large dataframes (several thousand items). Im sure there must be a better way of accomplishing what im trying to do that is more efficient. I fear im making the algorithm do more work than is necessary but I haven't been able to think of another way to do it.
I would be very grateful if anyone could suggest a faster/more efficient way of doing this.