1

I am trying to perform time series data analysis on financial data and I want to perform seasonal decomposition

from statsmodels.tsa.seasonal import seasonal_decompose
import pandas as pd
import datetime
import pandas_datareader as data
df = data.get_data_yahoo('UGA', start=everSince, end=today)
df_close = df[['Close']]
result = seasonal_decompose(df_close, model='multiplicative')

The error I get in this way

You must specify a period or x must be a pandas object with a PeriodIndex or a DatetimeIndex with a freq not set to None

I know I can specify the frequency as df.asfreq() but financial data do not have a daily frequency (i.e., I do not have an entry for every single day) since they are from Monday to Friday and sometimes there are holidays.

How can I apply seasonal_decompose to this kind of data? I have also tried df_close.index = df_close.index.to_period('B') but did not work.

An example of the df is:

                Close
Date                 
2008-02-28  49.790001
2008-02-29  49.610001
2008-03-03  49.810001
2008-03-04  47.450001
2008-03-05  49.049999
2008-03-06  49.369999
2008-03-07  50.230000
2008-03-10  50.610001
2008-03-11  50.700001
2008-03-12  50.919998
2008-03-13  49.939999
2008-03-14  50.049999
2008-03-17  46.869999
2008-03-18  48.980000
2008-03-19  47.540001
2008-03-20  48.070000
2008-03-24  48.459999
2008-03-25  49.490002
2008-03-26  50.320000
2008-03-27  50.110001
2008-03-28  50.009998
2008-03-31  48.509998
2008-04-01  48.840000
2008-04-02  51.130001
2008-04-03  50.419998
2008-04-04  50.900002
2008-04-07  51.430000
2008-04-08  50.959999
2008-04-09  51.290001
2008-04-10  51.540001

where indices are of type pandas.core.indexes.datetimes.DatetimeIndex.

roschach
  • 8,390
  • 14
  • 74
  • 124
  • can you provide example input data? – luigigi Oct 26 '21 at 12:34
  • @luigigi I did: they are downloaded with `data.get_data_yahoo('UGA', start=everSince, end=today)` as pandas dataframe. – roschach Oct 26 '21 at 12:40
  • @luigigi Or were you asking for a hard-coded df? – roschach Oct 26 '21 at 12:43
  • I didn't wanted to install the pandas-datareader package, so yes. just like df.head() – luigigi Oct 26 '21 at 12:45
  • Check if it is ok now – roschach Oct 26 '21 at 12:51
  • 1
    I see the problem, but I think you have to manipulate the data so it fits the requirements of seasonal_decompose. for example you can resample the data and replace the nan values by interpolating: `result = seasonal_decompose(df_close.resample('1D').asfreq().interpolate(), model='multiplicative')` – luigigi Oct 26 '21 at 12:59
  • Or you can resample it to weekly data if thats okay for you – luigigi Oct 26 '21 at 13:02
  • 1
    I'd like to keep a daily time-frame. I was hoping there were some way to manage such financial data but so far yours is the best way to go. – roschach Oct 26 '21 at 13:18
  • @luigigi Just one more question: do you know if there is a function to add the seasonal and trend component back when forecasting? – roschach Oct 26 '21 at 16:34
  • @FrancescoBoi Does my answer resolve your issue? – Mario Jan 27 '22 at 08:25

1 Answers1

0

Your issue can be solved by:

  • filling the missing date gaps within dataframe if you don't have daily data and replace respected values with 0
  • Set period/frequency for target attribute to make seasonality :
# import libraries
import numpy as np
import pandas as pd
import datetime as dt
import statsmodels.api as sm
from statsmodels.tsa.seasonal import seasonal_decompose
print(sm.__version__)

# Generate some data
TODAY = dt.date.today()
ONE_WEEK = dt.timedelta(days=107)
ONE_DAY = dt.timedelta(days=1)

# Create pandas dataframe 
df = pd.DataFrame({'Date': [TODAY-ONE_WEEK, TODAY-3*ONE_DAY, TODAY], 'Close': [42, 45,127]})

#      Date    Close
#0  2021-09-02  42
#1  2021-12-15  45
#2  2021-12-18  127

# Fill the missing dates and relative attribute with 0
r = pd.date_range(start=df.Date.min(), end=df.Date.max())
df = df.set_index('Date').reindex(r).fillna(0).rename_axis('Date').reset_index().dropna()

# Set period/frequency using set_index() dates
df['Date'] = pd.to_datetime(df['Date'])
df = df.set_index('Date').asfreq('D').dropna()

#            Close
#Date   
#2021-09-02 42.0
#2021-09-03 0.0
#2021-09-04 0.0
#2021-09-05 0.0
#2021-09-06 0.0
#...    ...
#2021-12-14 0.0
#2021-12-15 45.0
#2021-12-16 0.0
#2021-12-17 0.0
#2021-12-18 127.0
# 108 rows × 1 columns
  • Finally, now we can use the function seasonal_decompose() to decompose time-series data into other components:
# inspect frequency attribute
print(df.index.freq)  #<Day>

# Reproduce the example for OP and plot output
seasonal_decompose(df, model='additive').plot()

outputs:

img

Here is another output plot you can achieve via my another answer if you wish:

img

Note: decomposition doesn't work for model='multiplicative' due to:

ValueError: Multiplicative seasonality is not appropriate for zero and negative values
Mario
  • 1,631
  • 2
  • 21
  • 51