lmer: Error for mixed effects model with random intercept - number of levels of each grouping factor must be < number of observations

Question

I'm currently trying to run a mixed models solution to examine differences in warmth and competence ratings depending on intersectionality of target age and gender (race controlled) participants were asked to rate 2 random targets of different intersectional identities. There are 276 rows of data, 276 unique levels of ResponseId (e.,g., 276 participants), 3 age levels (Old, Young, empty) and 3 gender levels (Men, Women, empty).

It appears that using "ResponseId" is not appropriate for running this function - does anyone have an inkling as to why?

Here's what I have so far (note, some of "TargetGender" and "TargetAge" are intended to be empty as participants only evaluated some targets on either gender or age).

Sample data:

`         ResponseId TargetAge TargetGender TargetAge2 TargetGender2  Warmth1  Warmth2
1 R_3O1E4cOxRIejI1k       Old        Women                    Women   5.363636 5.272727
2 R_1EaFGkyVNdhlgQO       Old        Women                      Men   5.181818 5.181818
3 R_2eVHfsG4p7g0QZE       Old          Men      Young           Men   3.909091 3.545455
4 R_BtYn33qaXVoYh8d       Old          Men      Young           Men   1.363636 2.636364
5 R_d5S9ajl6C9bfTNL       Old        Women                    Women   4.727273 3.909091
6 R_1kXCRRZvdTmYsj7       Old        Women      Young           Men   5.454545 5.545455

Sample code and error:

model <- lmer(Warmth1 ~ TargetAge*TargetGender + (1 | ResponseId), 
              data=my_data)

Error: number of levels of each grouping factor must be < number of 
    observations (problems: ResponseId)

You likely need to [pivot your data to long format](https://stackoverflow.com/q/2185252/17303805), with one column each for target age, target gender, and warmth rating, and multiple rows for each participant. The error is telling you that since there’s only one row per participant, it doesn’t make sense to nest within participants. — zephryl, Mar 05 '23 at 20:58
Ah, this makes sense looking at some data that other people use. Thank you very much for the insight! — Marco Mai, Mar 05 '23 at 23:52

Ben Bolker · Accepted Answer · 2023-03-06T00:33:14.860

Following up on @zephyrl's comment that you need to convert your data to long format ("The error is telling you that since there’s only one row per participant, it doesn’t make sense to nest within participants"):

example data

This is your data from above, modified slightly (adding "1" to the target gender and age variable names for trial 1, to simplify reshaping the data):

dd <- read.csv(header=TRUE, row.names =1, text = "
ResponseId,TargetAge1,TargetGender1,TargetAge2,TargetGender2,Warmth1,Warmth2
1,R_3O1E4cOxRIejI1k,Old,Women,,Women,5.363636,5.272727
2,R_1EaFGkyVNdhlgQO,Old,Women,,Men,5.181818,5.181818
3,R_2eVHfsG4p7g0QZE,Old,Men,Young,Men,3.909091,3.545455
4,R_BtYn33qaXVoYh8d,Old,Men,Young,Men,1.363636,2.636364
5,R_d5S9ajl6C9bfTNL,Old,Women,,Women,4.727273,3.909091
6,R_1kXCRRZvdTmYsj7,Old,Women,Young,Men,5.454545,5.545455
")

reshaping

This is a slightly trickier-than-usual reshaping problem since the target-age, target-gender, and response (warmth) variables all need to be converted to long format. What I've done here works but is a little clunky — there may well be a SO question somewhere that explains how to do this more elegantly.

library(tidyverse)
dfun <- function(data, nm = "Warmth") {
    data |> dplyr::select(c(ResponseId, starts_with(nm))) |>
        pivot_longer(cols = starts_with(nm), names_prefix = nm,
                     values_to = nm, names_to = "trial")
}

d_long <- (dfun(dd, "Warmth")
    |> left_join(dfun(dd, "TargetAge"))
    |> left_join(dfun(dd, "TargetGender"))
    |> filter(TargetAge != "")  ## cases missing a trial
)

Now we're ready to fit:

library(lme4)
lmer(Warmth ~ TargetAge + TargetGender + (1|ResponseId), d_long)

The maximal model here would be

lmer(Warmth ~ TargetAge + TargetGender + 
        (TargetAge + TargetGender|ResponseId), 
        data = d_long)

because we may need to account for among-participant variation in age and gender effects (see e.g. Barr et al. 2013 "Random effects structure for confirmatory hypothesis testing: Keep it maximal" and Matuschek et al. 2017 "Balancing Type I error and power in linear mixed models").

Thanks Ben! Love that you noted among-participant variation, it's definitely a covariate I'm keeping in mind. I ended up using my own code to convert to long format, but I'm finding an issue with finding all fixed effects (I find them for Asian, Old, Young, Men, but not Women). I have an inkling it has to do with the model code error: "rank deficient so dropping 1 column". Any insights on this? — Marco Mai, Mar 06 '23 at 00:30
(That's a warning, not an error.) If you have dummies for both Men and Women then these two columns plus the intercept are multicollinear/jointly unidentifiable (i.e., since `Men + Women` is always equal to 1, you can't estimate the intercept, an effect for Men, and an effect for Women at the same time ... — Ben Bolker, Mar 06 '23 at 00:34
If this answer resolved your original question you are encouraged to click the check-mark to accept it (you can also upvote if you want) — Ben Bolker, Mar 06 '23 at 00:35

lmer: Error for mixed effects model with random intercept - number of levels of each grouping factor must be < number of observations

1 Answers1

example data

reshaping