28

I have a daily time series about number of visitors on the web site. my series start from 01/06/2014 until today 14/10/2015 so I wish to predict number of visitor for in the future. How can I read my series with R? I'm thinking:

series <- ts(visitors, frequency=365, start=c(2014, 6)) 

if yes,and after runing my time series model arimadata=auto.arima() I want to predict visitor's number for the next 6o days, how can i do this?

h=..?
forecast(arimadata,h=..), 

the value of h shoud be what ? thanks in advance for your help

Shawn Mehan
  • 4,513
  • 9
  • 31
  • 51
max
  • 281
  • 1
  • 4
  • 6
  • I suggest to go and check out the package developers Rob H. Hyndmans website Examples (http://robjhyndman.com/talks/MelbourneRUGexamples.R) – GWD Oct 14 '15 at 15:31
  • @WD11 thanks for your link, but i have not found an example like my dataset – max Oct 14 '15 at 15:42
  • make a search for "h =" on that site there you will find examples for 30 days and/or 12 months # Exponential smoothing; fit1 <- ets(beertrain, model="ANN", damped=FALSE); fit2 <- ets(beertrain); fcast1 <- forecast(fit1, h=8); fcast2 <- forecast(fit2, h=8); – GWD Oct 14 '15 at 15:45
  • here you have monthly data forecasted 8 months into the future ... have alook at the beertrain timeseries (ts) object and later on compare it to the fcast forecast object ... modify the idea according to your needs; there is also an ARIMA model somewhere in there; i.e. my guess would be h = 60 based on your info ... – GWD Oct 14 '15 at 15:52

4 Answers4

48

The ts specification is wrong; if you are setting this up as daily observations, then you need to specify what day of the year 2014 is June 1st and specify this in start:

## Create a daily Date object - helps my work on dates
inds <- seq(as.Date("2014-06-01"), as.Date("2015-10-14"), by = "day")

## Create a time series object
set.seed(25)
myts <- ts(rnorm(length(inds)),     # random data
           start = c(2014, as.numeric(format(inds[1], "%j"))),
           frequency = 365)

Note that I specify start as c(2014, as.numeric(format(inds[1], "%j"))). All the complicated bit is doing is working out what day of the year June 1st is:

> as.numeric(format(inds[1], "%j"))
[1] 152

Once you have this, you're effectively there:

## use auto.arima to choose ARIMA terms
fit <- auto.arima(myts)
## forecast for next 60 time points
fore <- forecast(fit, h = 60)
## plot it
plot(fore)

enter image description here

That seems suitable given the random data I supplied...

You'll need to select appropriate arguments for auto.arima() as suits your data.

Note that the x-axis labels refer to 0.5 (half) of a year.

Doing this via zoo

This might be easier to do via a zoo object created using the zoo package:

## create the zoo object as before
set.seed(25)
myzoo <- zoo(rnorm(length(inds)), inds)

Note you now don't need to specify any start or frequency info; just use inds computed earlier from the daily Date object.

Proceed as before

## use auto.arima to choose ARIMA terms
fit <- auto.arima(myts)
## forecast for next 60 time points
fore <- forecast(fit, h = 60)

The plot though will cause an issue as the x-axis is in days since the epoch (1970-01-01), so we need to suppress the auto plotting of this axis and then draw our own. This is easy as we have inds

## plot it
plot(fore, xaxt = "n")    # no x-axis 
Axis(inds, side = 1)

This only produces a couple of labeled ticks; if you want more control, tell R where you want the ticks and labels:

## plot it
plot(fore, xaxt = "n")    # no x-axis 
Axis(inds, side = 1,
     at = seq(inds[1], tail(inds, 1) + 60, by = "3 months"),
     format = "%b %Y")

Here we plot every 3 months.

Gavin Simpson
  • 170,508
  • 25
  • 396
  • 453
  • 1
    ,thanks very much for your good explanation, just i have a question what does means 2015.5 in the graph ?how can i get the exaclty date format yy/mm/dd? thanks in advance – max Oct 14 '15 at 16:01
  • Gavin Simpson :thanks for your rapid and your good explnation – max Oct 14 '15 at 16:42
  • It means half a year (6 months). Getting nicer axis labelling is probably easier with a **zoo** object. – Gavin Simpson Oct 14 '15 at 17:06
  • Gavin Simpson:thanks for your help , have you any idea about how can I intrepret acf and pacf plot ? – max Oct 15 '15 at 14:05
  • Thanks for the help, this saved me a lot of headache. Just to extend the answer a bit further, you can treat the 'zoo' series as a regular time series data. I applied it successfully with Naive, HoltWinters, and SES. Template code below: ## Create a daily Date object - helps my work on dates inds <- seq(as.Date("2014-06-01"), as.Date("2015-10-14"), by = "day") ## create the zoo object as before set.seed(25) myzoo <- zoo(rnorm(length(inds)), inds) ## use auto.arima to choose ARIMA terms fit <- auto.arima(myzoo) ## forecast for next 60 time points fore <- forecast(fit, h = 60) – jcdevilleres Mar 16 '20 at 17:29
  • Hi @GavinSimpson . Do I need to specify the 'day' even if I'm importing a csv data and the data includes dates? – jackDanielle May 20 '20 at 07:05
  • @jackDanielle If you use the **zoo** version then no you don't need to specify the day. At the top of the answer I created a vector of dates `inds` which I pass as the indices to `zoo()` along side the data. Your dates in the CSV file should take the place of `inds` in the example using `zoo()`. – Gavin Simpson May 20 '20 at 15:33
11

Time Series Object does not work well with creating daily time series. I will suggest you use the zoo library.

library(zoo)
zoo(visitors, seq(from = as.Date("2014-06-01"), to = as.Date("2015-10-14"), by = 1))
UseR10085
  • 7,120
  • 3
  • 24
  • 54
Amol Modi
  • 311
  • 2
  • 9
  • 2
    Thank you for the clear sample. Just to extend this answer further now you can treat the 'zoo' series as a regular time series, like as in: plot(myzoo), or ## use auto.arima to choose ARIMA terms fit <- auto.arima(myzoo) ## forecast for next 60 time points fore <- forecast(fit, h = 60) – jcdevilleres Mar 16 '20 at 17:27
3

Here's how I created a time series when I was given some daily observations with quite a few observations missing. @gavin-simpson gave quite a big help. Hopefully this saves someone some grief.

The original data looked something like this:

library(lubridate)
set.seed(42)
minday = as.Date("2001-01-01")
maxday = as.Date("2005-12-31")
dates <- seq(minday, maxday, "days")
dates <- dates[sample(1:length(dates),length(dates)/4)] # create some holes
df <- data.frame(date=sort(dates), val=sin(seq(from=0, to=2*pi, length=length(dates))))

To create a time-series with this data I created a 'dummy' dataframe with one row per date and merged that with the existing dataframe:

df <- merge(df, data.frame(date=seq(minday, maxday, "days")), all=T)

This dataframe can be cast into a timeseries. Missing dates are NA.

nts <- ts(df$val, frequency=365, start=c(year(minday), as.numeric(format(minday, "%j"))))
plot(nts)

holey sin wave

keithpjolley
  • 2,089
  • 1
  • 17
  • 20
1
series <- ts(visitors, frequency=365, start=c(2014, 152)) 

152 number is 01-06-2014 as it start from 152 number because of frequency=365 To forecast for 60 days, h=60.

forecast(arimadata , h=60)