0

I have the following code to get a day-ahead prediction for load consumption in 15 minute interval using outside air temperature and TOD(96 categorical variable, time of the day). When I run the code below, I get the following errors.

  i = 97:192
  formula = as.formula(load[i] ~ load[i-96] + oat[i])
  model = glm(formula, data = train.set, family=Gamma(link=vlog()))   

I get the following error after the last line using glm(),

Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) : 
  contrasts can be applied only to factors with 2 or more levels

And the following error shows up after the last line using predict(),

Warning messages:
1: In if (!se.fit) { :
  the condition has length > 1 and only the first element will be used
2: 'newdata' had 96 rows but variable(s) found have 1 rows 
3: In predict.lm(object, newdata, se.fit, scale = residual.scale, type = ifelse(type ==  :
  prediction from a rank-deficient fit may be misleading
4: In if (se.fit) list(fit = predictor, se.fit = se, df = df, residual.scale = sqrt(res.var)) else predictor :
  the condition has length > 1 and only the first element will be used
Python_R
  • 41
  • 2
  • 9
  • What does `i` represent? Are you trying to fit a single model, or (192-97+1) = 96 models? – Hong Ooi Jun 23 '13 at 14:54
  • i represents number of intervals from 15 minute data. train.set = dt.new[1:192,] test.set = dt.new[193:288,] – Python_R Jun 23 '13 at 15:01
  • So... you want to fit the model using a subset of rows from your data frame? And where does `link=vlog` come from? – Hong Ooi Jun 23 '13 at 15:02
  • Yes. But the subset in a previous day will be continuously used for a day-ahead prediction. I wrote vlog() to get rid of an error that I was getting from using gamma. – Python_R Jun 23 '13 at 15:04
  • @agstudy Sorry, but I don't understand what you mean. – Python_R Jun 23 '13 at 15:09
  • @Python_R You should give some data to reproduce your error. See [this](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). – agstudy Jun 23 '13 at 15:11

1 Answers1

1

You're doing things in a rather roundabout fashion, and one that doesn't translate well to making out-of-sample predictions. If you want to model on a subset of rows, then either subset the data argument directly, or use the subset argument.

train.set$load_lag <- c(rep(NA, 96), train.set$load[1:96])
mod <- glm(load ~ load_lag*TOD, data=train.set[97:192, ], ...)

You also need to rethink exactly what you're doing with TOD. If it has 96 levels, then you're fitting (at least) 96 degrees of freedom on 96 observations which won't give you a sensible outcome.

Hong Ooi
  • 56,353
  • 13
  • 134
  • 187
  • I was trying to think of a way to precisely explain how badly `formula = as.formula(load[i] ~ load[i-96]*TOD[i] + oat[i])` misunderstands how formulas work, but the best I could come up with was to point out that this results in fitting a model to a single data point. – joran Jun 23 '13 at 15:23
  • @Hong Ooi Thank you for your help. I will try it in a bit and I will let you know if it worked for me or not. – Python_R Jun 24 '13 at 05:11
  • @joran what suggestion do you have to fit the model correctly? – Python_R Jun 24 '13 at 05:12