3

I have daily electric load data from 1-1-2007 till 31-12-2016. I use ts() function to load the data like so

ts_load <- ts(data, start = c(2007,1), end = c(2016,12),frequency = 365)

I want to remove the yearly and weekly seasonality from my data, to decompose the data and remove the seasonality, I use the following code

decompose_load = decompose(ts_load, "additive")
deseasonalized = ts_load - decompose_load$seasonal

My question is, am I doing it right? is this the right way to remove the yearly seasonality? and what is the right way to remove the weekly seasonality?

Mohab Mostafa
  • 109
  • 1
  • 8
  • please include the actual data (using `dput(head(data))` so we can help you – morgan121 Dec 25 '18 at 14:28
  • @RAB > dput(head(ts_load)) c(94275, 97269, 98686, 98262, 96839, 97398) – Mohab Mostafa Dec 25 '18 at 15:35
  • 2
    Please add the data to your original comment. See: [How to make a great R reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) – Roman Dec 25 '18 at 15:53

2 Answers2

3

A few points:

  • a ts series must have regularly spaced points and the same number of points in each cycle. In the question a frequency of 365 is specified but some years, i.e. leap years, would have 366 points. In particular, if you want the frequency to be a year then you can't use daily or weekly data without adjustment since different years have different numbers of days and the number of weeks in a year is not integer.

  • decompose does not handle multiple seasonalities. If by weekly you mean remove the effect of Monday, of Tuesday, etc. and if by yearly you mean remove the effect of being 1st of the year, 2nd of the year, etc. then you are asking for multiple seasonalities.

  • end = c(2017, 12) means the 12th day of 2017 since frequency is 365.

The msts function in the forecast package can handle multiple and non-integer seasonalities.

Staying with base R, another approach is to approximate it by a linear model avoiding all the above problems (but ignoring correlations) and we will discuss that.

Assuming the data shown reproducibly in the Note at the end we define the day of week, dow, and day of year, doy, variables and regress on those with an intercept and trend and then construct just the intercept plus trend plus residuals in the last line of code to deseasonalize. This isn't absolutely necessary but we have used scale to remove the mean of trend in order that the three terms defining data.ds are mutually orthogonal -- Whether or not we do this the third term will be orthogonal to the other 2 by the properties of linear models.

trend <- scale(seq_along(d), TRUE, FALSE)
dow <- format(d, "%a")
doy <- format(d, "%j")
fm <- lm(data ~ trend + dow + doy)
data.ds <- coef(fm)[1] + coef(fm)[2] * trend + resid(fm)

Note

Test data used in reproducible form:

set.seed(123)
d <- seq(as.Date("2007-01-01"), as.Date("2016-12-31"), "day")
n <- length(d)
trend <- 1:n
seas_week <- rep(1:7, length = n)
seas_year <- rep(1:365, length = n)
noise <- rnorm(n)
data <- trend + seas_week + seas_year + noise
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
0

you can use the dsa function in the dsa package to adjust a daily time series. The advantage over the regression solution is, that it takes into account that the impact of the season can change over time, which is usually the case. In order to use that function, your data should be in the xts format (from the xts package). Because in that case the leap year is not ignored. The code will then look something like this:

install.packages(c("xts", "dsa"))

data = rnorm(365.25*10, 100, 1)
data_xts <- xts::xts(data, seq.Date(as.Date("2007-01-01"), by="days", length.out = length(data)))

sa = dsa::dsa(data_xts, fourier_number = 24) 
# the fourier_number is used to model monthly recurring seasonal patterns in the regARIMA part

data_adjusted <- sa$output[,1]