2

Here example of my dataset

ts=structure(list(Data = structure(c(10L, 14L, 18L, 22L, 26L, 29L, 
32L, 35L, 38L, 1L, 4L, 7L, 11L, 15L, 19L, 23L, 27L, 30L, 33L, 
36L, 39L, 2L, 5L, 8L, 12L, 16L, 20L, 24L, 28L, 31L, 34L, 37L, 
40L, 3L, 6L, 9L, 13L, 17L, 21L, 25L), .Label = c("01.01.2018", 
"01.01.2019", "01.01.2020", "01.02.2018", "01.02.2019", "01.02.2020", 
"01.03.2018", "01.03.2019", "01.03.2020", "01.04.2017", "01.04.2018", 
"01.04.2019", "01.04.2020", "01.05.2017", "01.05.2018", "01.05.2019", 
"01.05.2020", "01.06.2017", "01.06.2018", "01.06.2019", "01.06.2020", 
"01.07.2017", "01.07.2018", "01.07.2019", "01.07.2020", "01.08.2017", 
"01.08.2018", "01.08.2019", "01.09.2017", "01.09.2018", "01.09.2019", 
"01.10.2017", "01.10.2018", "01.10.2019", "01.11.2017", "01.11.2018", 
"01.11.2019", "01.12.2017", "01.12.2018", "01.12.2019"), class = "factor"), 
    client = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
    2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L), .Label = c("Horns", "Kornev"), class = "factor"), stuff = structure(c(1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 3L, 3L, 3L, 3L, 
    3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 2L, 2L, 2L, 2L, 
    2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("chickens", 
    "hooves", "Oysters"), class = "factor"), Sales = c(374L, 
    12L, 120L, 242L, 227L, 268L, 280L, 419L, 12L, 172L, 336L, 
    117L, 108L, 150L, 90L, 117L, 116L, 146L, 120L, 211L, 213L, 
    67L, 146L, 118L, 152L, 122L, 201L, 497L, 522L, 65L, 268L, 
    441L, 247L, 348L, 445L, 477L, 62L, 226L, 476L, 306L)), .Names = c("Data", 
"client", "stuff", "Sales"), class = "data.frame", row.names = c(NA, 
-40L))

I want perform time series using Arima models by group

#if using dummy
fun_tslm <- function(x, start = "2017-01-04", freq = 12){
  tsw <- ts(x[["Sales"]], start = decimal_date(as.Date(start)), frequency = freq)
  #View(tsw)
  mytslm <- tslm(tsw ~ trend + season)
  mytslm
}

fun_forecast <- function(x, h = 14){
  residarima1 <- auto.arima(x[["residuals"]])
  residualsArimaForecast <- forecast(residarima1, h = h)
  residualsF <- as.numeric(residualsArimaForecast$mean)
  regressionForecast <- forecast(x, h = h)
  regressionF <- as.numeric(regressionForecast$mean)
  forecastR <- regressionF + residualsF
  forecastR
}

tslm_list <- lapply(group_list, fun_tslm)
fore_list <- lapply(tslm_list, fun_forecast)

When I run this script I got the error

Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = object$xlevels) : factor season has new levels 4

But indeed I want to get output with Arima metrics where I can see 1.forecast initial value

enter image description here

2.forecast for 14 monthes with CI

enter image description here

outputs for initial values and forecasted values should be in two different data.frame. How to do it?

s__
  • 9,270
  • 3
  • 27
  • 45
psysky
  • 3,037
  • 5
  • 28
  • 64
  • It seems you're using some foreign packages, could you add them to your script? Also, your final lines imply the use of `group_list` , whom does not appears before (maybe is it `ts`?). – s__ Dec 24 '18 at 13:09

1 Answers1

2

Some parts are not too much clear in your script and your data, so I can try to give you a partial answer, to see how to get the result you want:

# I called your dataset in this way, because ts is a function
timeseries

Now, the idea is to convert to a list your data frame, each component of the list is a group, that is a time series. I imagined that each group is client + stuff, but we can manage it in different way:

# first the grouping variable
timeseries$group <- paste0(timeseries$client,timeseries$stuff)

# EDIT here you convert the Data class as class(date)
library(lubridate)
timeseries$Data <- dmy(timeseries$Data)

# now the list
listed <- split(timeseries,timeseries$group)

Now we have to define each component of the list as a time series, using lapply and ts function:

 # I do not understand why all your ts start with "2017-01-04", when in the example they are not (probably because it's an example)

 # EDIT: convert the start date
 listed_ts <- lapply(listed,
                     function(x) ts(x[["Sales"]], start = ymd("2017-01-04"), frequency = 12)  ) 

    listed_ts
$`Hornschickens`
      Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov
17170 374  12 120 242 227 268 280 419  12 172 336

$Hornshooves
      Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
17170 497 522  65 268 441 247 348 445 477  62 226 476
17171 306                                            

$KornevOysters
      Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
17170 117 108 150  90 117 116 146 120 211 213  67 146
17171 118 152 122 201       

The next step is to auto.arima each time series, with the lapply logic:

library(forecast)
listed_arima <- lapply(listed_ts,function(x) auto.arima(x) )
# partial result
> listed_arima
$`Hornschickens`
Series: x 
ARIMA(0,0,0) with non-zero mean 

Coefficients:
          mean
      223.8182
s.e.   38.7707

sigma^2 estimated as 18188:  log likelihood=-69.03
AIC=142.06   AICc=143.56   BIC=142.86
...

Now the forecast for each arima:

listed_forecast <- lapply(listed_arima,function(x) forecast(x,1) )

If you need to flat it down to a data.frame, do.call and rbind help:

do.call(rbind,listed_forecast)

              method                            model   level     mean     lower     upper     x          series fitted     residuals 
Hornschickens "ARIMA(0,0,0) with non-zero mean" List,18 Numeric,2 223.8182 Numeric,2 Numeric,2 Integer,11 "x"    Numeric,11 Numeric,11
Hornshooves   "ARIMA(0,0,0) with non-zero mean" List,18 Numeric,2 336.9231 Numeric,2 Numeric,2 Integer,13 "x"    Numeric,13 Numeric,13
KornevOysters "ARIMA(0,0,0) with non-zero mean" List,18 Numeric,2 137.125  Numeric,2 Numeric,2 Integer,16 "x"    Numeric,16 Numeric,16

I think you can twist it a bit more to have a better result. And if you are wondering why for this example, if you put more than 1 in the auto.arima function to predict, but the result is a constant, the answer is here, also pointed out by the method column on the output.

s__
  • 9,270
  • 3
  • 27
  • 45
  • 1
    good answer, but any questions.listed_forecast <- lapply(listed_arima,function(x) forecast(x,1) ) provides date 2017.925 , how to get normal date like yyyy-mm-dd)) – psysky Dec 25 '18 at 08:06
  • 1
    second question .listed_forecast <- lapply(listed_arima,function(x) forecast(x,1) ) gives a forecast for the future or for the current dates, which in the initial data. Cause i need forecast for real data 01-04-2017 to 01-07-2020 and forecast from 01-07-2020 to... with hystoric mean ok, i have read, now it doesn't matter. – psysky Dec 25 '18 at 08:35
  • 1
    ,I want ask how can i get predicted by auto.arima value for my initial data? Okey let's see in my example Sales for 01.04.2017 Horns chickens=374. Right? How can I see what value the auto.arima model predicted for this date? Do you understand what i want? – psysky Dec 25 '18 at 09:33
  • 1
    what about first question ,i use dmy from lubridate timeseries$Data=dmy(timeseries$Data) listed_ts <- lapply(listed, function(x) ts(x[["Sales"]], start = decimal_date(as.Date("2017-01-04")), frequency = 12) ) listed_ts and format again 2017.925. What i did wrong? – psysky Dec 25 '18 at 09:41
  • 1
    Edited, the things to change are two, let me know if it's not clear. – s__ Dec 25 '18 at 09:59
  • 1
    simple question 17170 what does mean? :)) – psysky Dec 25 '18 at 10:02
  • It seems the classic R way to help user to know who is the first element of a row, in this case seems the result of `as.numeric(ymd("2017-01-04"))`. – s__ Dec 25 '18 at 10:06
  • ,Yep, you are right. .:) and how convert 17170 back to "2017-01-04" in output?) – psysky Dec 25 '18 at 10:12
  • 1
    I do not [think it's possible](https://stackoverflow.com/questions/33128865/starting-a-daily-time-series-in-r), but I advice you to ask a new question about it, the community always make nice surprise. Also, I advice you to look [this](http://127.0.0.1:16566/library/stats/html/ts.html), to not have weird result, for example, due it's a daily ts, I think it's ok put frequency = 365. – s__ Dec 25 '18 at 10:21
  • 1
    Well i ll ask new question)) Thanks – psysky Dec 25 '18 at 10:24
  • may i ask you to help me in related topic based on your script? https://stackoverflow.com/questions/54158997/calculation-mape-metric-using-forecast-by-group-in-r – psysky Jan 12 '19 at 13:17
  • @G-spot sorry I could not log for a decent amount of time, but now your question is disappeared. – s__ Jan 14 '19 at 16:10