running lmer with a by/group by statement?

Question

I'm trying to find a quick way to run a lmer model but run it separately for each grouping variable (in SAS one can use the by= statement). I have tried using dplyr for this with a code I found:

t1<- mod1 %>% group_by(c) %>% do(function(df){lmer(m1.formula,data=df)})

but that doesn't seem to work.

Anyone know how to do this using dplyr or another method?

Why would you want to do that? With grouping data like this you *ignore* the structure of the data while mixed models are made for including this structure in your model. — Tim, Jan 19 '15 at 12:39
Hi guys and thx for the answer. what i want is to run separate mixed models for each group ("station" here). Ben: im not sure what you mean by reproducable (sorry im a neewb). do you mean the output or the DF itself? this is the error i get: > t1<- m1.all %>% group_by(stn) %>% do(function(df){lmer(m1.formula,data=df)}) Error: Don't know how to handle type pairlist im using latest dplyr 1.4 thx Z — zeltak brisbane, Jan 20 '15 at 13:34
See [here](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) for how to provide a reproducible example. It allows us to try things out in order to help. — LJW, Jan 21 '15 at 00:46

Ben Bolker · Answer 1 · 2015-02-05T20:02:54.007

library("lme4")
data(Orthodont,package="nlme")

There are two fundamental issues you might want to consider here:

statistical: as commented above, it's a bit odd to think about running mixed models separately on each stratum (grouping variable) within a data set. Usually the entire point of mixed models is to fit a model to the combined data set, although I can certainly imagine exceptions (below, I fit separate mixed models by sex). You might be looking for something like the lmList function (both nlme and lme4 have versions), which runs (generalized) linear models (not mixed models) on each stratum separately. This makes more sense, especially as an exploratory technique.
computational: doing specifically what you asked for in the dplyr framework is a bit difficult, because the basic dplyr paradigm assumes that you are operating on data frames (or data tables), possibly grouped, and returning data frames. That means that the bits returned by each operation must be data frames (not e.g. merMod model objects). (@docendodismus points out that you can do it by specifying do(model = ...) in the code below, but I think the structure of the resulting object is a bit weird and would encourage you to rethink your question, as illustrated below)

In base R, you can just do this:

lapply(split(Orthodont,Orthodont$Sex),
       lmer,formula=distance~age+(1|Subject))

or

by(Orthodont,Orthodont$Sex,
       lmer,formula=distance~age+(1|Subject))

Digression: If you want to fit linear (unmixed) models to each subject, you can use

## first remove 'groupedData' attributes from the data, which don't
## work with lme4's version of lmList
Orthodont <- data.frame(Orthodont) 
lmList(distance~age|Subject,Orthodont)
## note: not including Sex, which doesn't vary within subjects

Return to main thread: In the plyr (ancestor of dplyr) framework you can fit separate mixed models by sex slightly more compactly:

library("plyr")
dlply(Orthodont,.(Sex),
       lmer,formula=distance~age+(1|Subject))

detach("package:plyr")

If you want to do it in plyr, you seem to need do() (I thought I could do without it, but I haven't found a way), and you need to create a function that returns a summary as a data frame.

library("dplyr")

Orthodont %>% group_by(Sex) %>%
    do(lmer(.,formula=distance~age+(1|Subject)))

produces

## Error: Results are not data frames at positions: 1, 2

You could do this:

 lmer_sum <- function(x,...) {
      m <- lmer(x,...)
      c(fixef(m),unlist(VarCorr(m)))
        data.frame(rbind(c(fixef(m),unlist(VarCorr(m)))),
                   check.names=FALSE)
 }

(unlist(VarCorr(m)) gives the RE variance of the single scalar random effect; the whole data.frame(rbind(...)) thing is needed to convert a numeric vector into a one-row data frame; check.names=FALSE preserves the column name (Intercept))

 Orthodont %>% group_by(Sex) %>%
    do(lmer_sum(.,formula=distance~age+(1|Subject)))

which gives reasonable results.

I can't currently test it but I think this would work in dplyr: `Orthodont %>% group_by(Sex) %>% do(model = lmer(.,formula=distance~age+(1|Subject)))` although you would need to extract the entries in the list columns later on (All I changed was to include `model = ` inside `do()`) — talat, Feb 04 '15 at 22:54
@docendodiscimus: it does work, but it produces a pretty unusual (or at least pretty familiar to me) object ... — Ben Bolker, Feb 05 '15 at 01:30

score 0 · Answer 2 · answered Feb 04 '15 at 22:23

The problem is that you're calling do() wrongly - it doesn't work with anonymous functions like that. Arguments in do() are evaluated in the context of the data, so when you say function(df), do will try to use the df column of the data. It doesn't have that column, so it fails (with a cryptic message).

You can refer to the entire data frame in the grouping with ., and you don't need the anonymous function. You just call the (nested) functions directly with the . variable:

t1 <- mod1 %>% group_by(c) %>% do(lmer(m1.formula, .))

Untested because you didn't provide a reproducible example.

running lmer with a by/group by statement?

2 Answers2