2

I am trying out a novel hierarchical linear model but the data structure makes me wonder if this is even possible in R. My previous attempts at the model were incorrectly specified (oops) and now I'm not sure how to deal with this piece of work. My coursework in HLM covered multilevel models and cross-classified models, but not a 3-level double cross-classified model.

Level 1:

  • Responses to dichotomously scored items. Categorical dependent variable, so I think I'm going to be using glmer(). (~1.5 million responses to items)

Level 2:

  • Responses are nested within items - An item will have many responses (from different people), but a single response will not be linked to multiple items.
  • Responses are also nested within testing instances - A test instance will have many responses (50), but a response cannot link to multiple test instances.
  • Items are not nested with test instances and test instances are not nested within items. An item will appear at multiple test instances (every time someone takes Form A) and a test instance will be related to multiple items (Each item on the test form).

Level 3:

  • Items are nested within test forms - A form can have several items on it, but (in this case) items cannot appear on multiple forms.
  • Testing instances are nested within people - A person can participate in several test instances but a testing instance can't be executed by multiple people.
  • Testing instances are also nested within location - A location can have several test instances there, but a test instance can't occur at multiple locations
  • Test forms are not nested within people, people are not nested within test forms. A person can take multiple test forms and a test form can be taken by multiple people.
  • People are not nested within location and locations are not nested within people - A person can take a test at multiple locations and several people can take a test at a single location.
  • Test forms are not nested within location and locations are not nested within test forms - A test form can be used at multiple locations; a location can be used to administer many test forms.

I hypothesize that some location variables may have an impact on performance on particular items, but I think that will be moderated by things like the ability of the person taking the test. I have explanatory variables at the location, student, and item levels that I'm interested in exploring, like noise level, GPA, and subject matter.

Please let me know if you have any questions or suggestions.

Model diagram

Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
Aaron
  • 21
  • 1

1 Answers1

2

I don't see why this is a problem. "Modern" mixed model frameworks, like most of those available in R (nlme, lme4, glmmTMB, etc.) don't incorporate (or require) any explicit statement of nestedness; they only require that you specify the factors that define each grouping variable. I don't offhand see why

(1|student) + (1|location) + (1|instance) + (1|item) + (1|form)

won't work. There are a few things to consider, I don't know whether any of them are coming up in your problem:

  • if your responses are binary, you need to make sure that you don't have a random effect where every observation falls into a separate random-effect group (this level of variation is unidentifiable). For example, if every student answers a question at most once in every location (i.e. no-one ever takes the test more than once at a location, and they never answer a question more than once per test), then any given student/item/location combination will occur either zero or one times in your data set, and the variance of the (1|location:student:item) would be unidentifiable.
  • there may be some cases where interactions among your variables are identifiable (e.g., does the difficulty of a test item vary across locations?); this could be specified as (1|location:item), but you do have to be a little bit careful that the levels of your interaction don't uniquely identify observations (see the previous point)

You may find the GLMM FAQ useful, especially this section ...

PS I would definitely recommend experimenting with a subset of your original data (1.5 million items is certainly possible with glmer, but will be slow ...); you may also want to specify control=glmerControl(calc.derivs=FALSE) to skip some of the (slow) diagnostic checks

Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
  • Thanks for the input! I'm relieved to hear about the syntax - I wasn't seeing what was special in the cross-classified models that would make them different from a straight 3-level model. I guess there isn't a difference, really? Can you expand on what you mean by "every observation falling into a separate random effect group"? Do you have an example of what that might look like? I'll take a look at that FAQ. Thanks! – Aaron Dec 19 '20 at 22:53
  • Yes, there isn't a difference. Some implementations of mixed models have a hard time with crossed (and partially crossed) experimental designs, but not the ones listed in this question. – Ben Bolker Dec 19 '20 at 23:18