0

I have a dataframe with 13 columns and I wish to convert the dataframe to a time series so that I can perform a seasonal decomposition with stl()

My dataframe looks as follows:

> head(wideRawDF)
    Period.Start.Time DO0182U09A3 DO0182U09B3 DO0182U09C3 DO0182U21A1 DO0182U21A2 DO0182U21A3
1 2017-01-20 16:30:00     -101.50     -103.37     -103.86     -104.78     -104.95     -105.33
2 2017-01-20 16:45:00     -101.32     -102.75     -104.22     -104.51     -103.94     -105.29
3 2017-01-20 17:00:00     -101.45     -103.30     -103.93     -104.70     -104.82     -105.13
4 2017-01-20 17:15:00     -100.91      -95.92      -99.22     -103.83     -104.72     -105.19
5 2017-01-20 17:30:00     -100.91     -103.04     -104.09     -102.15     -104.91     -105.18
6 2017-01-20 17:45:00     -100.97     -103.67     -104.12     -105.07     -104.23      -97.48
  DO0182U21B1 DO0182U21B2 DO0182U21B3 DO0182U21C1 DO0182U21C2 DO0182U21C3
1     -102.50      -99.43     -104.05     -104.51     -104.42     -105.17
2     -102.82     -101.99     -103.94     -104.74     -104.65     -105.25
3     -103.72     -103.95     -104.25     -105.02     -105.04     -105.32
4     -103.57     -101.36     -104.09     -103.90     -102.95     -105.16
5     -103.88     -104.09     -103.96     -104.75     -105.07     -105.23
6     -103.92     -103.89     -104.01     -105.08     -105.14     -104.89

As you can see my data is in 15 min intervals.

I have tried converting this to a time series using the following code:

wideRawTS <- as.ts(wideRawDF, start = head(index(wideRawDF), 1), end = tail(index(wideRawDF), 1), frequency = 1)

I used frequency equal to 1 since I have 1343 rows of data, each one representing the sampling period.

1343/(14*24*4) = 0.999 => 1

wideRawTS looks as follows:

head(wideRawTS)
     Period.Start.Time DO0182U09A3 DO0182U09B3 DO0182U09C3 DO0182U21A1 DO0182U21A2 DO0182U21A3 DO0182U21B1 DO0182U21B2 DO0182U21B3
[1,]        1484929800     -101.50     -103.37     -103.86     -104.78     -104.95     -105.33     -102.50      -99.43     -104.05
[2,]        1484930700     -101.32     -102.75     -104.22     -104.51     -103.94     -105.29     -102.82     -101.99     -103.94
[3,]        1484931600     -101.45     -103.30     -103.93     -104.70     -104.82     -105.13     -103.72     -103.95     -104.25
[4,]        1484932500     -100.91      -95.92      -99.22     -103.83     -104.72     -105.19     -103.57     -101.36     -104.09
[5,]        1484933400     -100.91     -103.04     -104.09     -102.15     -104.91     -105.18     -103.88     -104.09     -103.96
[6,]        1484934300     -100.97     -103.67     -104.12     -105.07     -104.23      -97.48     -103.92     -103.89     -104.01
     DO0182U21C1 DO0182U21C2 DO0182U21C3
[1,]     -104.51     -104.42     -105.17
[2,]     -104.74     -104.65     -105.25
[3,]     -105.02     -105.04     -105.32
[4,]     -103.90     -102.95     -105.16
[5,]     -104.75     -105.07     -105.23
[6,]     -105.08     -105.14     -104.89

I believe the Period.Start.Time variable has been converted to epoch which is a unix representation of the number of seconds since Jan 1, 1970.

I have subsequently tried passing the time series data, wideRawTS to stl() but now get:

stl(wideRawTS[,2])
Error in stl(wideRawTS[, 2]) : 
  series is not periodic or has less than two periods

I've checked the first few epoch values and they are the correct representations of the original data so I don't know what's going on!

If someone would be so kind as to show me the errors of my way, I would be most grateful.

TheGoat
  • 2,587
  • 3
  • 25
  • 58
  • https://stackoverflow.com/questions/21123039/error-when-trying-to-use-stl-and-decompose-functions-in-r I think this question can help you understand your question. From my understanding, frequency is the number of data per unit time (i.e. per year). 15 minutes is one period but you are expecting multiple periods within a unit time, otherwise your ts will have less than two periods as it is said in the error message. – TooYoung Jun 12 '17 at 20:08

1 Answers1

2

It seems you have problems with specifying frequency of your data, look here, here and here, maybe it will help.

Also, isn't it an option to use xts package for time series manipulation?

library(xts)

Sys.setenv(TZ='GMT')

df <- read.table(sep = ",", header = TRUE, stringsAsFactors = FALSE, text = '
  Period.Start.Time,DO0182U09A3,DO0182U09B3,DO0182U09C3,DO0182U21A1,DO0182U21A2,DO0182U21A3\n
  "2017-01-20 16:30:00",-101.50,-103.37,-103.86,-104.78,-104.95,-105.3\n
  "2017-01-20 16:45:00",-101.32,-102.75,-104.22,-104.51,-103.94,-105.29\n
  "2017-01-20 17:00:00",-101.45,-103.30,-103.93,-104.70,-104.82,-105.13\n
  "2017-01-20 17:15:00",-100.91,-95.92,-99.22,-103.83,-104.72,-105.19\n
  "2017-01-20 17:30:00",-100.91,-103.04,-104.09,-102.15,-104.91,-105.18\n
  "2017-01-20 17:45:00",-100.97,-103.67,-104.12,-105.07,-104.23,-97.48
')

df2 <- xts(x = df[,-1], order.by = as.POSIXct(df[,1]))

It works for dummy data with the same amount of rows.

dummy <- xts(x = rnorm(1343), order.by = as.POSIXct("2017-01-20 16:30:00") + 15*60*(1:1343)) 
stl(ts(as.numeric(index(dummy)), frequency=12), s.window="periodic", robust=TRUE) 
Components
             seasonal      trend     remainder
Jan   1 -1.165038e-07 1484930700 -2.145767e-06
Feb   1  2.053829e-07 1484931600 -1.192093e-06
Mar   1 -2.190031e-08 1484932500  2.384186e-07
Apr   1 -1.643545e-07 1484933400  7.152557e-07
May   1 -5.919005e-09 1484934300  9.536743e-07
Jun   1  1.653720e-07 1484935200  2.384186e-07