0

I'm trying to predict for the model, yet it is showing an error:

Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = object$xlevels) : 
  variable lengths differ (found for 'Welfare.Measurment')

The test and train data are similar, same name and structure of variables. I even tried to rbind the two data frames, but the error persists.

Here is the code:

model3 <- lm(log(Poverty.Line.Day) ~ (log(data_abs$Median)) + 
              Welfare.Measurment + Control, data=data_abs)

predicted_poverty_Line <- 
  exp(predict(model3, dataF))*exp((summary(model3)$sigma)^2/2)
jay.sf
  • 60,139
  • 8
  • 53
  • 110
  • First of all, we don't have your data, please create a [reproduble example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with a `dput(data)` or `dput(head(data, 10))`. Second, your `log(data_abs$Median)` should be without the `data_abs$` part. – phiver Apr 20 '22 at 08:24

1 Answers1

1

In lm, do not use the $ in formula when using data= argument.

fit1 <- lm(y ~ train$X1 + X2, data=train)  ## predict will fail
predict(fit1, newdata=test)
# Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = obje
# ct$xlevels) : 
#   variable lengths differ (found for 'X2')

fit2 <- lm(y ~ X1 + X2, data=train)  ## predict will work
predict(fit2, newdata=test)

Reason: If you use e.g. train$X1 in the formula, the variable will be fixed, and even if you provide newdata= in predict, the old values will be used. If the vector is not accidentally of same length, you will get this error.


Data:

n <- 60
set.seed(42)
dat <- data.frame(X1=rnorm(n), X2=rnorm(n))
dat <- transform(dat, y=1 + X1 + rnorm(n))
train <- dat[1:20, ]
test <- dat[21:n, ]
jay.sf
  • 60,139
  • 8
  • 53
  • 110