3

I have hourly data like the sample below that I am trying to create a time-series from and use the window function with. My end goal is to use this to train an Arima model. I'm having a hard time getting ts() or window() to work with my date time format. I've also tried using msts() but couldn't get it to work with the date time format. I have gotten xts() to work, but it doesn't seem to work correctly with the window() or Arima().

Is it possible to use this date time format with ts() and the window() function? Any tips are greatly appreciated.

Code:

tsData <- ts(SampleData$MedTime[1:24],start='2015-01-01 00:00', frequency=168)

train <- window(tsData,end='2015-01-01 15:00')

Edit Note The data for this problem has been truncated to only 24 observations from the initial 525 provided. As a result, the window() call has been modified as well to a time within the truncated range.

Data:

dput(SampleData[1:24,c("DateTime","MedTime")])

SampleData = structure(list(DateTime = c("2015-01-01 00:00", "2015-01-01 01:00", "2015-01-01 02:00", "2015-01-01 03:00", "2015-01-01 04:00", "2015-01-01 05:00", "2015-01-01 06:00", "2015-01-01 07:00", "2015-01-01 08:00", "2015-01-01 09:00", "2015-01-01 10:00", "2015-01-01 11:00", "2015-01-01 12:00", "2015-01-01 13:00", "2015-01-01 14:00", "2015-01-01 15:00", "2015-01-01 16:00", "2015-01-01 17:00", "2015-01-01 18:00", "2015-01-01 19:00", "2015-01-01 20:00", "2015-01-01 21:00", "2015-01-01 22:00", "2015-01-01 23:00"), MedTime = c(11, 14, 17, 5, 5, 5.5, 8, NA, 5.5, 6.5, 8.5, 4, 5, 9, 10, 11, 7, 6, 7, 7, 5, 6, 9, 9)), .Names = c("DateTime", "MedTime"), row.names = c(NA, 24L), class = "data.frame")

coatless
  • 20,011
  • 13
  • 69
  • 84
user3476463
  • 3,967
  • 22
  • 57
  • 117

1 Answers1

3

Time Series in R

The ts() object has a few limitations. Most notably, it doesn't accept time stamps per observation. Instead, it requests a start and freq (the end is optional). Furthermore, the freq capabilities are limited to viewing data in terms of seasons.

Type      Frequency 
Annual     1
Quarterly  4
Monthly   12
Weekly    52

Thus, to generate the correct "season" we would have to calculate a daily seasonality where freq=1440 (=24*60). It gets a bit more complicated after that.

As a result, I would highly suggest creating the time series with an xts or zoo object.

Creating a datetime stamp

Next up, one of the reasons for your windowing issues is the date you are supplying is a string and not a POSIXct or POSIXlt object. The prior of which is preferred.

A full breakdown can be found:

Difference between as.POSIXct/as.POSIXlt and strptime for converting character vectors to POSIXct/POSIXlt

Dealing with timestamps in R

With that being said, one of the first steps is to convert your data from character form to POSIXct

# Convert to POSXICT
SampleData$DateTime = as.POSIXct(strptime(SampleData$DateTime, format ="%Y-%m-%d %H:%M"))

Windowing

From there, the windowing issue becomes trivial if we create a xts() object.

# install.packages("xts")
require(xts)

# Create an XTS object to hold the time series
sdts = xts(SampleData$MedTime, order.by = SampleData$DateTime)

# Subset training
train = window(sdts,end= as.POSIXct('2015-01-21 23:00', format ="%Y-%m-%d %H:%M"))
Community
  • 1
  • 1
coatless
  • 20,011
  • 13
  • 69
  • 84
  • why did you say that 24*60 is the daily frequency? If the monthly, weekly frequencies are 12 and 52 respectively, shouldn't the daily frequency be 365? – Ahmadov Aug 31 '16 at 16:23
  • 24*60 is the daily minute frequency. 24*2 is the half-hour frequency. 24*3600 is the daily second frequency and so on. The 365 in this case would be the daily yearly seasonality. The correct seasonality should indeed be 24 and if data exists for a year, then 8766 (365.25 * 24). – coatless Aug 31 '16 at 20:00