6

I am trying to separate seasonality, trend and residual from timeseries 'XYZ.csv' (sales data collected over 2 years of time).

[XYZ.csv contains 2 columns - date and sales. Date has been set as an index within the code.]

import pandas as pd

import statsmodels.api as sm

df = pd.read_csv('XYZ.csv')

df.date=pd.to_datetime(df.date)

df.set_index('date',inplace=True)

res = sm.tsa.seasonal_decompose
(df.colA.interpolate(),freq=?, model='additive')

resplot= res.plot()

observed = res.observed

seasonality = res.seasonal

This code works fine. The only trouble is to understand how to calculate the frequency for this time series? And if there is any predefined way in which I can do it. Thanks for any help/suggestions in advance!

avariant
  • 2,234
  • 5
  • 25
  • 33
Analyst17
  • 163
  • 1
  • 2
  • 13
  • 2
    Frequency is a property of your data. If you collected your data month by month, then it has monthly frequency (12 since 1 year has 12 months). – ayhan Apr 18 '18 at 18:22
  • the seasonal_decompose function in the statsmodels library no longer requires a frequency parameter – Sid Kwakkel Feb 05 '21 at 00:28

1 Answers1

0

A very brut force approach would consist in searching the period minimizing the residuals by exploring all the potential periods:

res_vs_lag = {}
for p in range(1, 250):
    res = sm.tsa.seasonal_decompose(df.colA, period=p, model='additive')
    res_vs_lag[p] = res.resid.abs().sum()

Then you can plot the resulting series:

pd.Series(res_vs_lag).plot()

An elegant approach would rely on autocorrelations or spectral analysis (https://www.statsmodels.org/dev/generated/statsmodels.tsa.stattools.pacf.html).

dokteurwho
  • 321
  • 2
  • 6