0

The larger scope of this question is that I have time series stock data that is collected hourly, and I want to track the data over several timeframes (1 hour, 1 day, 1 week, 1 month, etc.).

Certain functions require me to selectively pick out certain rows (for example, a single record at the end of each month, for the sake of determining true standard deviation). Therefore there are a given # of 'bars' (rows) that each of those timeframes encompasses. I've already made a column in my dataframe that tracks that # of rows per period. Due to holidays, as well as the simple fact that a month's time period varies based on the month, the # of rows per period also varies over time; otherwise it would just be a simple shift of a static value (like 7 for the daily timeframe on hourly data (6.5 hours in a trading day)).

I need to create a boolean mask for the entire dataframe every time a new row of data comes in to maintain accuracy in my lookbacks. How would I go about doing this by avoiding having to manually loop over the entire dataframe every time I get new data? Keep in mind that you need to work backwards from the latest entry towards the earliest.

Here's a much simpler dataframe example to work with:

df = pd.DataFrame({'state': np.arange(1, 11), 'bars': [1, 2, 3, 3, 3, 4, 5, 5, 4, 3]})

# Results in:

    state   bars
0   1       1
1   2       2
2   3       3
3   4       3
4   5       3
5   6       4
6   7       5
7   8       5
8   9       4
9   10      3

# I would expect `boolean_mask` to look like this:

0   False
1   True
2   False
3   False
4   False
5   False
6   True
7   False
8   False
9   True

# Therefore `df[boolean_mask].loc['state']` becomes:

0   2
1   7
2   10

# If row #9 did not exist, the resulting mask would be different.

1 Answers1

0

Figured it out; it was simpler than I imagined. The solution is to reverse the dataframe, apply a function that decrements until there are no more bars to ignore, then assigns True and repeats the process; finally, reverse that list and we have our boolean mask.

def OnPeriod(bars, r):
  if r['bars'] == 0:
      r['bars'] = bars - 1
      return True

  r['bars'] -= 1
  return False

mask = df['bars'][::-1].apply(OnPeriod, args=[{'bars': 0}])[::-1].values
  • Glad your found a solution, but since you already deal with a timeseries: you might be better off using that together with all these fancy Pandas functions for time and date. Have look at these examples: [find the end of a month](https://stackoverflow.com/questions/37354105/find-the-end-of-the-month-pandas-dataframe-series), [custom business days](https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#timeseries-custombusinessdays). – amain May 14 '20 at 23:25
  • Thanks amain, I can't exactly rely on those functions alone to do what I want for many reasons, but I have implemented that type of boolean masking as well as stock-exchange-specific methods in order to determine the 'bars' feature to begin with. – SnakeWasTheNameTheyGaveMe May 14 '20 at 23:54