lme4 + Analysis by Group

Question

I have a file that I am working with that contains State Math Test Scores for Grades 4-8 for multiple schools. I am planning on using the lmer command from lme4 to run a separate unconditional model for each grade level (I am doing this because the exams have different scales). I created a for-loop that splits the original data file into grade-specific dataframes:

for(i in levels(MathTestData$CURR_GRADE_LVL)){
  assign(paste("MathTest.",i,sep=""), MathTestData[MathTestData$CURR_GRADE_LVL==i, ])
}

Is there a way to use a loop command that can run an unconditional model for each grade-level dataframe? For the purposes of this example, let's call the dependent variable "MathScore" and the Level-2 id "SchoolID."

Since the grade levels are between 4 and 8, I tried the following for-loop, but it did not work (I get the error: "Error: 'data' not found, and some variables missing from formula environment"):

for(i in 4:8){
  UnconditionalModel.i <- lmer(MathScore~1+(1|SchoolID), data=MathTest.i)
  summary(UnconditionalModel.i)
}

The main issue I have is that I understand one can use lapply on a list as suggested by others, but by doing so, you end up losing a lot of valuable information that would otherwise be provided by the lme4 and lmerTest packages (specifically p-values and test statistics for the fixed effects, and the random effects are presented as standard deviations as opposed to variances).

In genera, you will find it difficult to work with sequentially named data frames. Better to keep them in a list: `MathTestList = split(MathTestData, MathTestData$CURR_GRADE_LVL)`. This will make it easy to loop over. See [How to make a list of data frames](https://stackoverflow.com/questions/17499013/how-do-i-make-a-list-of-data-frames/24376207#24376207) for more examples. (I would consider your question a duplicate of that one, except that in this particular use-case I think Denis's recommendation to not split at all and use multilevel regression is better.) — Gregor Thomas, Jan 18 '18 at 15:00
This really isn't a duplicate in terms of making lists from a dataset. As I explain up top, your method of analyzing a list in lme4 causes me to lose important statistical information that I would need to know. — B. Norris, Jan 18 '18 at 18:03
The `list` solution I propose and the multilevel regression solution that Denis proposes are mutually exclusive. Your problem is that in your loop `data=MathTest.i` doesn't parse the `i`. But if your data subsets are in a list (see my first comment for the command) you can use `data = MathTestList[[i]]` inside your loop. You should also assign the results to a list: `UncondModelList = list()` (before the loop) and the loop becomes `for (i in seq_along(MathTestList) { UncondModelList[[i]] <- lmer(..., data = MathTestList[[i]]) }`. — Gregor Thomas, Jan 18 '18 at 18:39
Regarding your last paragraph, I can't imagine how any info would be lost by using a data frame in a list. `lmer` really has no way to know whether the data you give to it comes from a list or not. It certainly doesn't change its reporting of p-values and random effects. Where did you hear this? Have you attempted to verify it at all? — Gregor Thomas, Jan 18 '18 at 18:45
Do note that p-values for fixed effects have been removed from the recent versions of the `lmer` package because the authors feel they are not theoretically sound - this is regardless of whether the data is part of a list or not. — Gregor Thomas, Jan 18 '18 at 18:58
If you are having trouble using `lmer` on a list, I'd suggest opening a question specifcally about that. We can help you work through any problems you're having there. — Gregor Thomas, Jan 18 '18 at 21:44
Yes, I have attempted to verify it. That's why I am asking the question because I am already aware of these limitations. — B. Norris, Jan 19 '18 at 19:07
I'd love to see an example of the different results depending on using a data frames in a list or not. I just did a simple test and they seem identical. You can [see my comparison in this gist](https://gist.github.com/gregorp/9dc9f07a86e98b3c0c8a2dce94554b5b). — Gregor Thomas, Jan 19 '18 at 19:39
This is helpful and what I was looking for information on. Thank you! — B. Norris, Jan 19 '18 at 20:15

score 3 · Answer 1 · answered Jan 18 '18 at 14:31

3

I think you could use multilevel regression on your entire data frame.

lmer(mathscor ~ schoolid + (1 + schooldid | CURR_GRADE_LVL) )

This way your slope and intercept change for each grade. You can use ranef to recover the coefficient for each grade

answered Jan 18 '18 at 14:31

denis

5,580
1
13
40

There isn't the same number of schools for each grade level, so wouldn't this throw an error? – B. Norris Jan 18 '18 at 17:52

lme4 + Analysis by Group

1 Answers1