54

Time series decomposition is a method that separates a time-series data set into three (or more) components. For example:

x(t) = s(t) + m(t) + e(t)

where

t is the time coordinate
x is the data
s is the seasonal component
e is the random error term
m is the trend

In R I would do the functions decompose and stl. How would I do this in python?

sfjac
  • 7,119
  • 5
  • 45
  • 69
user3084006
  • 5,344
  • 11
  • 32
  • 41

5 Answers5

76

I've been having a similar issue and am trying to find the best path forward. Try moving your data into a Pandas DataFrame and then call StatsModels tsa.seasonal_decompose. See the following example:

import statsmodels.api as sm

dta = sm.datasets.co2.load_pandas().data
# deal with missing values. see issue
dta.co2.interpolate(inplace=True)

res = sm.tsa.seasonal_decompose(dta.co2)
resplot = res.plot()

Three plots produced from above input

You can then recover the individual components of the decomposition from:

res.resid
res.seasonal
res.trend

I hope this helps!

AN6U5
  • 2,755
  • 22
  • 20
  • How do you re-construct the original time-series from the components? – vgoklani Oct 09 '16 at 15:38
  • You are able to choose whether to deconstruct them via subtraction or division. Addition is the typical method, in which case you just add the components back together. – AN6U5 Oct 09 '16 at 18:26
  • In that code snippet, which variable holds the input data ? – davneetnarang Oct 25 '16 at 11:33
  • dta.co2 is the input, but you should also be able to access it after the call as res.observed. – AN6U5 Oct 25 '16 at 11:38
  • 2
    Per the StatsModels documentation, this is a naive decomposition compared to STL. Not sure if it's just bad params or what, but you don't want to see this sort of seasonal structure in the remainder series. As cast42 notes below, it might be better to use https://github.com/andreas-h/pyloess –  Feb 20 '18 at 12:51
  • Take a look at Seasonal and Trend decomposition using Loess (STL): https://github.com/jrmontag/STLDecompose – AChervony Nov 25 '19 at 23:50
11

I already answered this question here, but below is a quick function on how to do this with rpy2. This enables you to use R's robust statistical decomposition with loess, but in python!

    import pandas as pd

    from rpy2.robjects import r, pandas2ri
    import numpy as np
    from rpy2.robjects.packages import importr


def decompose(series, frequency, s_window = 'periodic', log = False,  **kwargs):
    '''
    Decompose a time series into seasonal, trend and irregular components using loess, 
    acronym STL.
    https://www.rdocumentation.org/packages/stats/versions/3.4.3/topics/stl

    params:
        series: a time series

        frequency: the number of observations per “cycle” 
                   (normally a year, but sometimes a week, a day or an hour)
                   https://robjhyndman.com/hyndsight/seasonal-periods/

        s_window: either the character string "periodic" or the span 
                 (in lags) of the loess window for seasonal extraction, 
                 which should be odd and at least 7, according to Cleveland 
                 et al.

        log:    boolean.  take log of series



        **kwargs:  See other params for stl at 
           https://www.rdocumentation.org/packages/stats/versions/3.4.3/topics/stl
    '''

    df = pd.DataFrame()
    df['date'] = series.index
    if log: series = series.pipe(np.log)
    s = [x for x in series.values]
    length = len(series)
    s = r.ts(s, frequency=frequency)
    decomposed = [x for x in r.stl(s, s_window).rx2('time.series')]
    df['observed'] = series.values
    df['trend'] = decomposed[length:2*length]
    df['seasonal'] = decomposed[0:length]
    df['residuals'] = decomposed[2*length:3*length]
    return df

The above function assumes that your series has a datetime index. It returns a dataframe with the individual components that you can then graph with your favorite graphing library.

You can pass the parameters for stl seen here, but change any period to underscore, for example the positional argument in the above function is s_window, but in the above link it is s.window. Also, I found some of the above code on this repository.

Example data

Hopefully the below works, honestly haven't tried it since this is a request long after I answered the question.

import pandas as pd
import numpy as np
obs_per_cycle = 52
observations = obs_per_cycle * 3
data = [v+2*i for i,v in enumerate(np.random.normal(5, 1, observations))]
tidx = pd.date_range('2016-07-01', periods=observations, freq='w')
ts = pd.Series(data=data, index=tidx)
df = decompose(ts, frequency=obs_per_cycle, s_window = 'periodic')
Jeff Tilton
  • 1,256
  • 1
  • 14
  • 28
  • does this library work with Python 3 or can it only be used with Python 2? – Tanguy Apr 01 '18 at 20:49
  • 1
    How does this compare with the statsmodels approach? – dstandish Apr 02 '18 at 18:00
  • @Tanguy, I use this with python3. Make sure to series.dropna() before using. You also need to set your R_HOME environment variable to wherever your R version is located, and you need to download the forecast package. – Jeff Tilton Apr 02 '18 at 18:15
  • @chorbs from [stats](http://www.statsmodels.org/dev/generated/statsmodels.tsa.seasonal.seasonal_decompose.html) "This is a naive decomposition. More sophisticated methods should be preferred." From [OTexts](https://www.otexts.org/fpp/6/5) "STL is a very versatile and robust method for decomposing time series...STL has several advantages over the classical decomposition method and X-12-ARIMA..." – Jeff Tilton Apr 02 '18 at 18:17
  • @JeffTilton: unfortunately did not manage to install rpy2 (I am using "Python 3.6.1 :: Anaconda 4.4.0 (64-bit)" under Windows 10), even after setting the environment variable for R, installing the forecast package and (after first having an encoding error) changing encoding settings in pip as suggested in this post https://github.com/pypa/pip/issues/4251#issuecomment-279117184. Guess I should just switch to R :) – Tanguy Apr 03 '18 at 13:35
  • 1
    @Tanguy, those are pretty much my exact settings as well. Using rpy2 can feel a little hacky and if you are going to do a lot of time series analysis R is probably the stronger language of the two right now and will just work. Rob Hyndman's forecast package is solid and just works in R without any of the hassle. – Jeff Tilton Apr 03 '18 at 14:06
  • @JeffTilton Could you pls give an example regarding the parameter `series`?Not sure which datatype to pass. Maybe in terms of a simple Dataframe with date/series?Thanks! – Googme Jun 18 '19 at 12:36
  • 1
    @Googme it just require a pandas series. It is expecting a datetime index, but would probably run regardless. I put an example above, but have not tested it. – Jeff Tilton Jun 19 '19 at 15:26
5

You can call R functions from python using rpy2 Install rpy2 using pip with: pip install rpy2 Then use this wrapper: https://gist.github.com/andreas-h/7808564 to call the STL functionality provided by R

cast42
  • 1,999
  • 1
  • 16
  • 10
0

Have you been introduced to scipy yet? From what I've seen in a few PDFs/sites

Here and Here

it's doable. But without seeing a specific example it would be hard for someone to show you a code example. Scipy is awesome I use it in my research stuff, still haven't been let down by it.

Matt
  • 3,508
  • 6
  • 38
  • 66
  • 3
    Yes I am currently using Scipy including statsmodel, pandas, and numpy. The closest thing I can find is using the `resample` from pandas but that does not allow you to deseason a timeseries. – user3084006 Dec 19 '13 at 04:16
  • **scipy** can *optimize* but cannot give additional needed staff , like **AR-models** from statsmodels.tsa can - including validation of the model & confidence-intervals for the forecast -- using scipy.signal.minimize/maximize you will need to do the rest by youself – JeeyCi Jun 15 '23 at 16:29
  • besides *autocorrelation-factor* make it easy to adjust appropriate period/seasonality for **ARIMA/SARIMA** models when adjusting models from AR-family – JeeyCi Jun 15 '23 at 16:33
0

also Seasonal-Trend decomposition using LOESS (STL)

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import statsmodels.api as sm

dta = sm.datasets.co2.load_pandas().data

# deal with missing values
dta.co2.interpolate(inplace=True)

#################### sm.tsa.seasonal_decompose
##res = sm.tsa.seasonal_decompose(dta.co2)
##resplot = res.plot()
##plt.show()

#################### Seasonal-Trend decomposition using LOESS (STL)
## https://www.statsmodels.org/dev/examples/notebooks/generated/stl_decomposition.html

from statsmodels.tsa.seasonal import STL

stl = STL(dta.co2, seasonal=13)
##stl = STL(dta.co2, period=12, seasonal_deg=0, trend_deg=1, low_pass_deg=1, robust=True)
res = stl.fit()
fig = res.plot()
plt.show()

trend = res.trend
seasonal = res.seasonal
residual = res.resid

df = pd.concat([res.trend, res.seasonal, res.resid], 1)
print(df)

with STL residuals seems more Normally-distributed at a glance (resembles white-noise in a greater extent)... still commented here:

seasonal_decompose uses moving averages, rather than LOESS.

P.S. modelling with SARIMA gives more reasonable results than just decomposing by-hand -- though results can be comparable - e.g.

JeeyCi
  • 354
  • 2
  • 9