0

I am trying to build a predictive model from survey data. My DVs are questions on NPS and other like data points. My IVs are mainly demographical question. I keep getting a Variable lengths error using the following lines of code:

Model <- lm(Q6 ~ amount_spent + first_time + gender + 
                 workshop_participation + adults + children + 
                 household_adults + Below..25K. + X.25K.to..49K. + 
                 X.50K.to..74K. + X.75K.to..99K. + X.100K.to..124K. + 
                 X18.24. + X25.34. + X35.44. + X45.64.,
            data = diy_festival2)

Here is the error: Error in model.frame.default(formula = Q6 ~ amount_spent + first_time + : variable lengths differ (found for 'Below..25K.')

What are some possible causes and what are some potential fixes I can try?

InfiniteFlash
  • 1,038
  • 1
  • 10
  • 22
ngarn
  • 1
  • 1
  • 2
  • try omitting missing values from your dataset using `na.omit()` – InfiniteFlash Feb 21 '18 at 19:19
  • It's because the length of your predictor variables is not equal to the length of your target variable. If you define `y<-matrix(c(1.1,2.2))` and `x<-matrix(c(3.1,3.2,3.3))` and then try to run `lm(y~x)` you will get the same error, this is because the length of `x` is not equal to the length of `y` – antonioACR1 Feb 21 '18 at 19:38
  • When asking for help, you should include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions – MrFlick Feb 21 '18 at 19:39

1 Answers1

-1

Your formula object is referencing (a) variable(s) that is not in diy_festival2. It is in the global environment, the debug suggests it is Below..25K.

x <- data.frame(x1=rnorm(100))
x2 <- rnorm(10)
model.matrix( ~ x1 + x2, data=x)

gives the error you have.

AdamO
  • 4,283
  • 1
  • 27
  • 39