0

I have a file that I am working with that contains State Math Test Scores for Grades 4-8 for multiple schools. I am planning on using the lmer command from lme4 to run a separate unconditional model for each grade level (I am doing this because the exams have different scales). I created a for-loop that splits the original data file into grade-specific dataframes:

for(i in levels(MathTestData$CURR_GRADE_LVL)){
  assign(paste("MathTest.",i,sep=""), MathTestData[MathTestData$CURR_GRADE_LVL==i, ])
}

Is there a way to use a loop command that can run an unconditional model for each grade-level dataframe? For the purposes of this example, let's call the dependent variable "MathScore" and the Level-2 id "SchoolID."

Since the grade levels are between 4 and 8, I tried the following for-loop, but it did not work (I get the error: "Error: 'data' not found, and some variables missing from formula environment"):

for(i in 4:8){
  UnconditionalModel.i <- lmer(MathScore~1+(1|SchoolID), data=MathTest.i)
  summary(UnconditionalModel.i)
}

The main issue I have is that I understand one can use lapply on a list as suggested by others, but by doing so, you end up losing a lot of valuable information that would otherwise be provided by the lme4 and lmerTest packages (specifically p-values and test statistics for the fixed effects, and the random effects are presented as standard deviations as opposed to variances).

B. Norris
  • 57
  • 6
  • 1
    In genera, you will find it difficult to work with sequentially named data frames. Better to keep them in a list: `MathTestList = split(MathTestData, MathTestData$CURR_GRADE_LVL)`. This will make it easy to loop over. See [How to make a list of data frames](https://stackoverflow.com/questions/17499013/how-do-i-make-a-list-of-data-frames/24376207#24376207) for more examples. (I would consider your question a duplicate of that one, except that in this particular use-case I think Denis's recommendation to not split at all and use multilevel regression is better.) – Gregor Thomas Jan 18 '18 at 15:00
  • This really isn't a duplicate in terms of making lists from a dataset. As I explain up top, your method of analyzing a list in lme4 causes me to lose important statistical information that I would need to know. – B. Norris Jan 18 '18 at 18:03
  • The `list` solution I propose and the multilevel regression solution that Denis proposes are mutually exclusive. Your problem is that in your loop `data=MathTest.i` doesn't parse the `i`. But if your data subsets are in a list (see my first comment for the command) you can use `data = MathTestList[[i]]` inside your loop. You should also assign the results to a list: `UncondModelList = list()` (before the loop) and the loop becomes `for (i in seq_along(MathTestList) { UncondModelList[[i]] <- lmer(..., data = MathTestList[[i]]) }`. – Gregor Thomas Jan 18 '18 at 18:39
  • Regarding your last paragraph, I can't imagine how any info would be lost by using a data frame in a list. `lmer` really has no way to know whether the data you give to it comes from a list or not. It certainly doesn't change its reporting of p-values and random effects. Where did you hear this? Have you attempted to verify it at all? – Gregor Thomas Jan 18 '18 at 18:45
  • Do note that p-values for fixed effects have been removed from the recent versions of the `lmer` package because the authors feel they are not theoretically sound - this is regardless of whether the data is part of a list or not. – Gregor Thomas Jan 18 '18 at 18:58
  • If you are having trouble using `lmer` on a list, I'd suggest opening a question specifcally about that. We can help you work through any problems you're having there. – Gregor Thomas Jan 18 '18 at 21:44
  • Yes, I have attempted to verify it. That's why I am asking the question because I am already aware of these limitations. – B. Norris Jan 19 '18 at 19:07
  • I'd love to see an example of the different results depending on using a data frames in a list or not. I just did a simple test and they seem identical. You can [see my comparison in this gist](https://gist.github.com/gregorp/9dc9f07a86e98b3c0c8a2dce94554b5b). – Gregor Thomas Jan 19 '18 at 19:39
  • This is helpful and what I was looking for information on. Thank you! – B. Norris Jan 19 '18 at 20:15

1 Answers1

3

I think you could use multilevel regression on your entire data frame.

lmer(mathscor ~ schoolid + (1 + schooldid | CURR_GRADE_LVL) ) 

This way your slope and intercept change for each grade. You can use ranef to recover the coefficient for each grade

denis
  • 5,580
  • 1
  • 13
  • 40