2

Following links were investigated but didn't provide me with the answer I was looking for/fixing my problem: First, Second.

Due to confidentiality issues I cannot post the actual decomposition I can show my current code and give the lengths of the data set if this isn't enough I will remove the question.

import numpy as np
from statsmodels.tsa import seasonal
def stl_decomposition(data):
    data = np.array(data)
    data = [item for sublist in data for item in sublist]
    decomposed = seasonal.seasonal_decompose(x=data, freq=12)

    seas = decomposed.seasonal
    trend = decomposed.trend
    res = decomposed.resid

In a plot it shows it decomposes correctly according to an additive model. However the trend and residual lists have NaN values for the first and last 6 months. The current data set is of size 10*12. Ideally this should work for something as small as only 2 years.

Is this still too small as said in the first link? I.e. I need to extrapolate the extra points myself?

EDIT: Seems that always half of the frequency is NaN on both ends of trend and residual. Same still holds for decreasing size of data set.

Marciano
  • 142
  • 1
  • 11

2 Answers2

11

According to this Github link another user had a similar question. They 'fixed' this issue. To avoid NaNs an extra parameter can be passed.

decomposed = seasonal.seasonal_decompose(x=data, freq=12, extrapolate_trend='freq')

It will then use a Linear Least Squares to best approximate the values. (Source)

Obviously the information was literally on their documentation and clearly explained but I completely missed/misinterpreted it. Hence I am answering my own question for someone who has the same issue, to save them the adventure I had.

Marciano
  • 142
  • 1
  • 11
0

According to the parameter definition below, setting extrapolate_trend other than 0 makes the trend estimation revert to a different estimation method. I faced this issue when I had a few observations for estimation.

extrapolate_trend : int or 'freq', optional
    If set to > 0, the trend resulting from the convolution is
    linear least-squares extrapolated on both ends (or the single one
    if two_sided is False) considering this many (+1) closest points.
    If set to 'freq', use `freq` closest points. Setting this parameter
    results in no NaN values in trend or resid components.
Mehran F Langerudi
  • 131
  • 1
  • 2
  • 11