4

I'm attempting to do a seasonal_decompose on my pandas dataframe but I've encountered an error that I can't get past. My time series data contains chronological gaps which is sensible considering my data is stock prices (after market hours create these gaps, as well as differing month lengths, etc.). The data in of itself can be thought of as contiguous however pandas doesn't seem to want to infer any frequency.

All of my timeframe data (1m, 5m, 15m ... 1D, 1M) is populated correctly but with None set as the frequency. My algorithm creates an empty dataframe upon instantiation and adds values to it via loc as the data arrives during the course of the algorithm's runtime. So perhaps that's ultimately why the frequency is None (as Pandas is typically used after all data is generated).

I've tried explicitly setting the frequency right before seasonal_decompose using:

data.index.freq = data.index.freq or to_offset(timeframe.Timespan).freqstr

where timeframe.Timespan is a python timedelta object. The resulting string is accurate ("D" because the timespan happens to be daily) but the following error occurs:

ValueError: Inferred frequency None from passed values does not conform to passed frequency D

So I can't explicitly set the frequency on my index? How do I solve this? How is the frequency integer (being passed to seasonal_decompose) derived from these strings anyway? I'm also not permitted to change the value of data.index.inferred_freq so that's not an option.

  • have you tried `data.asfreq`? I suppose in this case it'd be `data.asfreq(data.index.freq or to_offset(timeframe.Timespan).freqstr)` – aiguofer May 21 '19 at 21:10
  • Won't that change the data respective to the original? I'd like to keep my timestamps the same. I have to compare my predictions to the original dataset. Or would I do asfreq and then dropna? – SnakeWasTheNameTheyGaveMe May 21 '19 at 21:14
  • _"My time series data contains chronological gaps"_ then you can't set the frequency as-is, because the index is not actually of the given frequency. To set the frequency every element in your index must separated by `freq` units, i.e. if your index jumps from `'2018-01-01'` to `'2018-01-03'` you can't set it the frequency to `'D'` as you're missing a day. – root May 21 '19 at 21:18
  • @JakeTheSnake You need to have a continuous range for `freq` to work. Calling `asfreq` would indeed adjust your index and create entries for any missing time periods (using `None` as the default value). If you call `dropna` you'd lose your `freq` again. – aiguofer May 21 '19 at 21:36
  • What would you suggest I do in this situation? Should I fill in the NAs with a value? If so, how? At first thought there are many ways to treat the data, all which would affect the predictions. If I keep the NAs will I even be able to do a seasonal_decompose? – SnakeWasTheNameTheyGaveMe May 21 '19 at 22:13

1 Answers1

2

Sounds like what you need is DataFrame.asfreq:

data = data.asfreq(data.index.freq or to_offset(timeframe.Timespan).freqstr)
aiguofer
  • 1,887
  • 20
  • 34