0

How to use seasonal_decompose. How to deal with various errors while using seasonal_decompose. How can we practically use or implement seasonal_decompose.

sakeesh
  • 919
  • 1
  • 10
  • 24

1 Answers1

1

Get all imports

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import datetime
from statsmodels.tsa.seasonal import seasonal_decompose

Prepare test data

data = {'Unix Timestamp': ['1.61888E+12','1.61888E+12','1.61888E+12','1.61888E+12','1.61888E+12','1.61888E+12','1.61888E+12','1.61888E+12','1.61888E+12','1.61888E+12','1.61888E+12','1.61888E+12','1.61888E+12','1.61888E+12','1.61888E+12','1.61888E+12','1.61888E+12','1.61888E+12','1.61888E+12','1.61888E+12','1.61888E+12','1.61888E+12','1.61888E+12','1.61888E+12','1.61888E+12','1.61888E+12','1.61888E+12','1.61888E+12','1.61888E+12','1.61888E+12','1.61888E+12','1.61888E+12','1.61888E+12','1.61888E+12'],
 'Date': ['4/20/2021 0:02','4/20/2021 0:01','4/20/2021 0:00','4/19/2021 23:59','4/19/2021 23:58','4/19/2021 23:57','4/19/2021 23:56','4/19/2021 23:55','4/19/2021 23:54','4/19/2021 23:53','4/19/2021 23:52','4/19/2021 23:51','4/19/2021 23:50','4/19/2021 23:49','4/19/2021 23:48','4/19/2021 23:47','4/19/2021 23:46','4/20/2021 0:02','4/20/2021 0:01','4/20/2021 0:00','4/19/2021 23:59','4/19/2021 23:58','4/19/2021 23:57','4/19/2021 23:56','4/19/2021 23:55','4/19/2021 23:54','4/19/2021 23:53','4/19/2021 23:52','4/19/2021 23:51','4/19/2021 23:50','4/19/2021 23:49','4/19/2021 23:48','4/19/2021 23:47','4/19/2021 23:46'],
 'Symbol': ['BTCUSD','BTCUSD','BTCUSD','BTCUSD','BTCUSD','BTCUSD','BTCUSD','BTCUSD','BTCUSD','BTCUSD','BTCUSD','BTCUSD','BTCUSD','BTCUSD','BTCUSD','BTCUSD','BTCUSD','BTCUSD','BTCUSD','BTCUSD','BTCUSD','BTCUSD','BTCUSD','BTCUSD','BTCUSD','BTCUSD','BTCUSD','BTCUSD','BTCUSD','BTCUSD','BTCUSD','BTCUSD','BTCUSD','BTCUSD'],
 'Open': [55717.47,55768.94,55691.79,55777.86,55803.5,55690.64,55624.69,55651.82,55688.08,55749.28,55704.59,55779.38,55816.61,55843.69,55880.12,55890.88,0,55717.47,55768.94,55691.79,55777.86,55803.5,55690.64,55624.69,55651.82,55688.08,55749.28,55704.59,55779.38,55816.61,55843.69,55880.12,55890.88,0],
 'High': [55723,55849.82,55793.15,55777.86,55823.88,55822.91,55713.02,55675.92,55730.21,55749.28,55759.27,55779.38,55835.57,55863.89,55916.47,55918.87,0,55723,55849.82,55793.15,55777.86,55823.88,55822.91,55713.02,55675.92,55730.21,55749.28,55759.27,55779.38,55835.57,55863.89,55916.47,55918.87,0],
 'Low': [55541.69,55711.74,55691.79,55677.92,55773.08,55682.56,55624.63,55621.58,55641.46,55688.08,55695.42,55688.66,55769.46,55797.08,55815.99,55826.84,0,55541.69,55711.74,55691.79,55677.92,55773.08,55682.56,55624.63,55621.58,55641.46,55688.08,55695.42,55688.66,55769.46,55797.08,55815.99,55826.84,0]}
df=pd.DataFrame(data)

Perform decomposition

df_seasonal = seasonal_decompose(df)

We get our first error

ValueError: could not convert string to float:

Lets fix the above error, for this run below code

df['Date'] = df['Date'].apply(
    lambda x :  datetime.datetime.strptime(str(x),'%m/%d/%Y %H:%M')
)

Now if you run seasonal_decompose again, you will get new error

df_seasonal = seasonal_decompose(df)

Now the new error will be

TypeError: float() argument must be a string or a number, not 'Timestamp'

To fix this error we pass one column at a time and the column passed should be a string or a number. Try the decompose using below code

df_seasonal = seasonal_decompose(df['Open'])

Now you get a new error, as shown below

ValueError: You must specify a period or x must be a pandas object with a PeriodIndex or a DatetimeIndex with a freq not set to None

There are two solution's to this error First Solution:- use period parameter for seasonal_decompose

df_seasonal = seasonal_decompose(df['Open'],period = 1) ## here we have data for every minute and hence period is 1 , but this need not be correct. 

In above code we have data for every minute and hence period is 1. However, this need not be correct period is actually cycle period of input data. To know more on how to decide on period read this page. To know the complete list of freq abbrevations click here

Second Solution:- create an datetime index for the data along with frequency

df = df.set_index(df.Date).asfreq('2Min') ## M for Months S for Seconds. Here we cannot resample data with frequency 1Min, as data is already in frequency of 1Min, hence we used 2Min here
df_seasonal = seasonal_decompose(df['Open']) ## here we didn't use period and freq argument

In seasonal_decompose we have to set the model ( By default its Addictive). We can either set the model to be Additive or Multiplicative. A rule of thumb for selecting the right model is to see in our plot if the trend and seasonal variation are relatively constant over time, in other words, linear. If yes, then we will select the Additive model. Otherwise, if the trend and seasonal variation increase or decrease over time then we use the Multiplicative model. So that means before we do seasonal_decompose we must plot the preprocessed data over time and see if there are any trends or cycles.

Finally we could run it without error.

Another error that we might see is TypeError: Index(...) must be called with a collection of some kind, 'seasonal' was passed, this again happens due to wrong usage of seasonal_decompose like for example below

df_bt_decomp = seasonal_decompose(df_bt[['Open','High']],period=1) ## this is wrong because we have used two columns together and both are valid metric and not an index.
sakeesh
  • 919
  • 1
  • 10
  • 24