0

I'm trying to take an anova test for two different models in R: a lm model vs. a knn model. The problem is this error appears:

Error in anova.lmlist(object, ...) : models were not all fitted to the same size of dataset

I think this make sense because I want to know if there are statistical evidences of difference between models. In order to give you a reproducible example, here you have:

#Getting dataset
xtra <- read.csv("california.dat", comment.char="@")
names(xtra) <- c("Longitude", "Latitude", "HousingMedianAge",
"TotalRooms", "TotalBedrooms", "Population", "Households",
"MedianIncome", "MedianHouseValue")
n <- length(names(xtra)) - 1
names(xtra)[1:n] <- paste ("X", 1:n, sep="")
names(xtra)[n+1] <- "Y"

#Regression model
reg.model<-lm(Y~.,data=xtra)

#Knn-model
knn.model<-kknn(Y~.,train=xtra,test=xtra,kernel = "optimal")

anova(reg.model,knn.model)

What I'm doing wrong?

Thanks in advance.

Carlos
  • 889
  • 3
  • 12
  • 34
  • You can get `california.dat` from here: http://sci2s.ugr.es/keel/dataset/data/regression/california.zip – Carlos Nov 20 '17 at 10:34
  • You'll most likely find your answer here: https://stackoverflow.com/questions/18387258/r-error-which-says-models-were-not-all-fitted-to-the-same-size-of-dataset – AntoniosK Nov 20 '17 at 10:54
  • Thanks for your response, @AntoniosK, but my dataset has no `NA` values in any column, so, these answer is not useful for me. – Carlos Nov 20 '17 at 11:04
  • 1
    If I `predict`, I get different values from two models, but none of it is `NA`. – Carlos Nov 20 '17 at 11:21
  • Yes, I checked myself as well and that's why I removed the suggestion. It's definitely not a NA issue here :-) – AntoniosK Nov 20 '17 at 11:24

1 Answers1

0

My guess would be that the two models aren't comparable with anova() and this error is being thrown because one of the models will be deemed empty.

From the documentation for anova(object,...):

  • object - an object containing the results returned by a model fitting function (e.g., lm or glm).

  • ... - additional objects of the same type.

When you look to see if the models can be compared you can see they're of different types:

> class(knn.model)
[1] "kknn"
> class(reg.model)
[1] "lm"

Probably more importantly if you try and run anova() for knn.model you can see that you cannot apply the function to a kknn object:

> anova(knn.model)

Error in UseMethod("anova") : 
  no applicable method for 'anova' applied to an object of class "kknn"
Andrew Haynes
  • 2,612
  • 2
  • 20
  • 35
  • Well, two models can `predict`, in fact, `knn.model` includes predictions if i get `knn.model$fitted.values`, so, I think it shouldn't be treated as empty. Anyway, the problem could be what you say, `kknn` object is not compatible with `anova()`. So, what you'd do in order to get an statistical (meaningfully similar to anova) comparison? – Carlos Nov 20 '17 at 11:20
  • What I want to do is simply compare the two models. I can get an SME error from both models in a regression problems. So if I perform cross-validation for two models, I get two different "numbers". I need `anova` in order to check (via p-value) if errors differs statistically, this is, if a model is (statistically) better than the other one. – Carlos Nov 20 '17 at 11:30
  • @Carlos you can get an error for each prediction (pred - actual). And therefore a squared error. Assuming you predict on a dataset with N rows, you'll get N squared errors for each model. You can use a statistical test to compare those averages (which will be the mean squared error of each model). Like a `t.test`. – AntoniosK Nov 20 '17 at 12:03
  • At the end I took Wilcoxon test, that compares two different models, given a p-value. Thanks anyway! – Carlos Nov 20 '17 at 15:14