0

I have seen that some guys had this problem, too, but I really didn't understand the given answers.

I did some linear mixed models starting with the "intercept only" model. Subsequently I wanted to add more variables. When I try to compare the models, the R out put was "models were not all fitted to the same size of dataset". What do I have to do to fit data at the same dataset?

The R syntax is:

mod_zero <- lmer(quality ~ 1 + (1|subject_id))
summary(mod_zero)
mod_one <- lmer(quality ~ ps + an + int + ch + boredom + (1|subject_id),dat)
summary(mod_one)
anova(mod_zero, mod_one)

Adding na.rm=T did not help. Does anyone have an idea?

Christian Gollhardt
  • 16,510
  • 17
  • 74
  • 111
Andrea
  • 3
  • 4
  • 1
    Try to format you're question (there are formating options for code). Adding the tag R will also help a lot. Try also to link to the questions that didn't help you and explain which part you didn't get. Finally try to create a reproducible example: http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – takje Mar 22 '17 at 18:04

1 Answers1

1

The error is likely caused by the presence of missing data in one or more predictors in the second model. These observations are removed from the second model (thereby creating a different dataset which is a subset of the original data) and you cannot meaningfully compare two models that are fit to different datasets. To compare both models you'll have to fit the first model to a dataset without missing data on ps, an, int, ch, boredom. Try:

dat2 <- dat[which(complete.cases(dat[,c('ps', 'an', 'int', 'ch', 'boredom')])),]

mod_zero <- lmer(quality ~ 1 + (1|subject_id), dat2)
mod_one <- lmer(quality ~ ps + an + int + ch + boredom + (1|subject_id),dat2) 

anova(mod_zero, mod_one) 

This solves the error, but you should ask yourself why there is missing data. Removing missing data could bias your results depending on the missing data mechanism. If you have a lot of missing data that is systematically related to your outcome variable this will bias your model estimates and you'll need to look into ways of reducing this bias (e.g. multiple imputation). Graham has written a lot of books and articles that explain different missing data mechanisms and solutions. Comparing the output of mod_zero on dat and dat2 may give a first indication of possible bias (although a similar output does not ensure the absence of bias).

Niek
  • 1,594
  • 10
  • 20
  • Thank your very much! That really helped! I have now an idea how to built the R syntax for the other model comparisons. :-) You made my day! :-) Regarding the missing data: For ps, an, int, ch and boredem, there are 1071 entries each. At ps and an there are 9 missing values, for ch 6 values and none values are missing for int. You are of course right: Missing values have an effect on results.Here missing values are below 1%. If there are no data, then people did not give a response to that question. – Andrea Mar 22 '17 at 16:03