2

I want to see whether the fixed effect Group2 in my model is significant. The model is:

Response ~ Group1 + Group2 + Gender + Age + BMI + (1 | Subject)

To check the significance I create a null model not containing the effect Group2:

Resp.null = lmer(Response~Group1+Gender+Age+BMI+(1|Subject),
    data=mydata,REML=FALSE)

and the full model containing the effect Group2:

Resp.model = lmer(Response~Group1+Group2+Gender+Age+BMI+(1|Subject), 
    data=mydata,REML=FALSE)

Then I use anova() to compare the two, but I get an error:

anova(Resp.null, Resp.model)
## Error in anova.merMod(Resp.null, Resp.model) : 
##   models were not all fitted to the same size of dataset

I think that the problem is that Group1 contains NaN, but I thought that linear mixed models were robust to missing data. How can I solve this problem and compare the two models?

Do I have to delete the rows corresponding to NaN and fit Resp.null without these rows?


The data can be downloaded here. Please note that you should replace "<undefined>" with NaN like this:

mydata = read.csv("mydata.csv")

mydata[mydata == "<undefined>"] <- NA
landroni
  • 2,902
  • 1
  • 32
  • 39
gabboshow
  • 5,359
  • 12
  • 48
  • 98
  • Could you share a reproducible version of mydata (for example, with `dput(mydata)`)? If it's too large, or too private, see if you can create a subset of the data and reproduce it (e.g. with `dput(head(mydata, 10))`), or reproduce it on public data. Otherwise it's impossible to tell why the function failed in your case – David Robinson Sep 10 '15 at 17:44
  • I added the link to download the data. Thanks! – gabboshow Sep 10 '15 at 17:50
  • Please note that the problem (and the missing values) is in Group2...I edited the question – gabboshow Sep 10 '15 at 17:53
  • 2
    Yes, you need to remove the rows of missing values so both models are fit to the same dataset to use `anova.merMod`. You could create a new data object without the missing values and feed this to both model calls or you can add the subset argument to both models - something like `subset = !is.na(Group1)`. – aosmith Sep 10 '15 at 18:12
  • Can I use the full model (comprising rows containing NaN) to describe the response or should I always refer to the "submodel"? – gabboshow Sep 10 '15 at 18:16
  • @aosmith you shouldn't need `anova.mermod`. Method dispatch will find the appropriate generic function. The proper call is `anova`. – alexwhitworth Sep 14 '15 at 16:59
  • @Alex I mention `anova.merMod` specifically because the behavior that the subset/dataset must be identical for the full and reduced model isn't standard for `anova` methods of other objects I commonly use (`anova.lm` and `anova.lme` spring to mind) so this recommendation of using `subset` in both is specific to `anova.merMod`. – aosmith Sep 14 '15 at 17:13
  • @aosmith this (subset) doesn't matter--see the function definition: `anova <- function (object, ...) {UseMethod("anova")}` If you supply two `merMod` objects to `anova`, method dispatch will find the appropriate generic function (ie- `anova.merMod`). Certainly, you can make the call directly, but it's generally not the preferred practice. – alexwhitworth Sep 14 '15 at 17:53
  • Using `subset = !is.na(Group1)` or `na.omit(data)` would probably lose cases, as `lmer` imputes missing data and has probably more observations than just the complete case deletion. – Daniel May 27 '16 at 06:28

2 Answers2

1

To avoid the "models were not all fitted to the same size of dataset" error in anova, you must fit both models on the exact same subset of data.

There are two simple ways to do this, and while this reproducible example uses lm and update, for lmer objects the same approach should work:

# 1st approach
# define a convenience wrapper
update_nested <- function(object, formula., ..., evaluate = TRUE){
    update(object = object, formula. = formula., data = object$model, ..., evaluate = evaluate)
}

# prepare data with NAs
data(mtcars)
for(i in 1:ncol(mtcars)) mtcars[i,i] <- NA

xa <- lm(mpg~cyl+disp, mtcars)
xb <- update_nested(xa, .~.-cyl)
anova(xa, xb)
## Analysis of Variance Table
## 
## Model 1: mpg ~ cyl + disp
## Model 2: mpg ~ disp
##   Res.Df    RSS Df Sum of Sq      F  Pr(>F)  
## 1     26 256.91                              
## 2     27 301.32 -1   -44.411 4.4945 0.04371 *
## ---
## Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

# 2nd approach
xc <- update(xa, .~.-cyl, data=na.omit(mtcars[ , all.vars(formula(xa))]))
anova(xa, xc)
## Analysis of Variance Table
## 
## Model 1: mpg ~ cyl + disp
## Model 2: mpg ~ disp
##   Res.Df    RSS Df Sum of Sq      F  Pr(>F)  
## 1     26 256.91                              
## 2     27 301.32 -1   -44.411 4.4945 0.04371 *
## ---
## Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

If however you're only interested in testing a single variable (e.g. Group2), then perhaps the Anova() or linearHypothesis() in car would work as well for this usecase.

See also:

Community
  • 1
  • 1
landroni
  • 2,902
  • 1
  • 32
  • 39
0

Fit Resp.model first, then use Resp.model@frame as data argument.

Resp.null = lmer(Response~Group1+Gender+Age+BMI+(1|Subject),
data=Resp.model@frame,REML=FALSE)
Daniel
  • 7,252
  • 6
  • 26
  • 38
  • better to use the accessor method `model.frame(Resp.model)` rather than accessing slots directly ... note (both this answer and the other one) that this approach will fail when the formula involves complex bases (`poly()`, `ns()`, `bs()` or data transformations (`log(x)`) – Ben Bolker May 26 '16 at 21:56
  • @BenBolker Do you mean the `Resp.model@frame` approach or the `model.frame(Resp.model)` approach? And why would it fail in case of say `log(x)` transformations? – landroni May 26 '16 at 22:13
  • either. If you have a term `log(x)` in the formula, then the model frame will actually contain a column called `log(x)`. Then, when `lmer` tries to evaluate the term `log(x)` in the formula in the *new* model, it will try to find an `x` column in the data and will fail. – Ben Bolker May 26 '16 at 22:21
  • Yes, `model.frame` should usually be used, especially when you write your own function or packages. However, for a quick analysis, I sometimes use the way that requires less typing (i.e. fit@frame instead of model.frame(fit)) :-) – Daniel May 27 '16 at 06:29