2

I am trying to get the CVlm function to work applying the DAAG package my dataset: fit (has 27 entries (rows) with 6 variables I use the following expression in R:

CrossVal<-CVlm(df=fit,m=3,
form.lm=formula(fit$X1~fit$X2 + fit$X3 + fit$X4 + fit$X5 + fit$X6))

when using m=1 it works nice by for m different from 1 (e.g. 3 as shown above) I get an error message:

Error in `[<-.data.frame`(`*tmp*`, rows.out, "cvpred", value = c(228.541323416399,  : 
  replacement has 27 rows, data has 9
In addition: Advarselsbesked:
'newdata' had 9 rows but variable(s) found have 27 rows 

I would be happy for some help to get the cv to work properly Thanks in advance

agstudy
  • 119,832
  • 17
  • 199
  • 261
Lars Carlsen
  • 51
  • 1
  • 2
  • Please make your situation reproducible, i.e. provide us with the data and the code needed to mimic your situation. See http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example for more tips on how to do this. – Paul Hiemstra Mar 24 '13 at 13:45

1 Answers1

4

Hard to say whithout a reproducible example , But I think you don't write the right formula. This should work for you

 CrossVal<-CVlm(df=fit,m=3,
                form.lm= formula(X1 ~ X2 + X3 + X4 + X5 + X6))

For example using housprices data from DAAG I can reproduce the error:

 CVlm(df = houseprices, form.lm =
        formula(houseprices$sale.price ~ houseprices$area), m=2)
Error in `[<-.data.frame`(`*tmp*`, rows.out, "cvpred", value = c(201.067581902091,  : 
  replacement has 15 rows, data has 7

but this works fine :

CVlm(df = houseprices, form.lm = formula(sale.price ~ area), m=2)
Analysis of Variance Table

Response: sale.price
          Df Sum Sq Mean Sq F value Pr(>F)  
area       1  18566   18566       8  0.014 *

EDIT why m =1 works and not with m different of 1:

Here the part of the code of CVlm where the error occurs:

subs.lm <- lm(form, data = df[rows.in, ])
df[rows.out, "cvpred"] <- predict(subs.lm, newdata = df[rows.out, 

The error occurs because we try to set 9 rows of df with 27 rows. ])

Error in `[<-.data.frame`(`*tmp*`, rows.out, "cvpred", value = c(228.541323416399,  : 
  replacement has 27 rows, data has 9

indeed predict works with the effect side, it don't use the newdata object but uses the original data.frame since you give the formula using $ and this is what it is printed in the warning :

In addition: Advarselsbesked:
'newdata' had 9 rows but variable(s) found have 27 rows 

with m=1 it works because newdata has the same number of rows than the original data set. Of course the result is not correct because it don't use the newdata subset which is permutation of the original one.

agstudy
  • 119,832
  • 17
  • 199
  • 261