The larger scope of this question is that I have time series stock data that is collected hourly, and I want to track the data over several timeframes (1 hour, 1 day, 1 week, 1 month, etc.).
Certain functions require me to selectively pick out certain rows (for example, a single record at the end of each month, for the sake of determining true standard deviation). Therefore there are a given # of 'bars' (rows) that each of those timeframes encompasses. I've already made a column in my dataframe that tracks that # of rows per period. Due to holidays, as well as the simple fact that a month's time period varies based on the month, the # of rows per period also varies over time; otherwise it would just be a simple shift
of a static value (like 7 for the daily timeframe on hourly data (6.5 hours in a trading day)).
I need to create a boolean mask for the entire dataframe every time a new row of data comes in to maintain accuracy in my lookbacks. How would I go about doing this by avoiding having to manually loop over the entire dataframe every time I get new data? Keep in mind that you need to work backwards from the latest entry towards the earliest.
Here's a much simpler dataframe example to work with:
df = pd.DataFrame({'state': np.arange(1, 11), 'bars': [1, 2, 3, 3, 3, 4, 5, 5, 4, 3]})
# Results in:
state bars
0 1 1
1 2 2
2 3 3
3 4 3
4 5 3
5 6 4
6 7 5
7 8 5
8 9 4
9 10 3
# I would expect `boolean_mask` to look like this:
0 False
1 True
2 False
3 False
4 False
5 False
6 True
7 False
8 False
9 True
# Therefore `df[boolean_mask].loc['state']` becomes:
0 2
1 7
2 10
# If row #9 did not exist, the resulting mask would be different.