1

I am trying to understand how to solve the following error when I try and run a glmer in R using the lme4 package:

Error: number of levels of each grouping factor must be < number of observations.

To give some background, I am asking the question of whether having a certain amount of a specific phylum of bacteria is associated with relative mass gain in a population of animals I am studying. I have a pretty small sample size, 20 unique individuals. My fixed effects are relative mass gain, bacterial phyla 1, phyla 2, age, and date with the random effect of individual, colony area, and year. This is the code I am running:

Model1 <- glmer(relative_growth ~ firmicutes + bacteroidetes + age_class + date +
                (1|uid) + (1|col_area) + (1|year), data = microbiome)

Here is a sample subset of my data:

dput(microbiome[1:5],)
structure(list(uid = structure(c(5L, 8L, 11L, 13L, 9L, 10L, 1L, 
12L, 16L, 17L, 18L, 14L, 20L, 19L, 7L, 4L, 15L, 6L, 2L, 3L), .Label = c("6127_7339", 
"6385_6342", "6609_7388", "6835_6898", "7131_7126", "7187_7189", 
"7279_7197", "7365_7368", "7640_7641", "7753_7754", "7755_7756", 
"7780_7781", "7783_7793", "7828_7874", "7830_7849", "8005_8009", 
"8111_8107", "8476_8478", "8491_8492", "8497_8488"), class = "factor"), 
    year = c(2015L, 2015L, 2016L, 2016L, 2016L, 2016L, 2016L, 
    2016L, 2017L, 2017L, 2018L, 2018L, 2018L, 2018L, 2016L, 2016L, 
    2018L, 2018L, 2018L, 2015L), col_area = structure(c(4L, 4L, 
    4L, 4L, 4L, 4L, 4L, 4L, 2L, 2L, 3L, 2L, 3L, 3L, 4L, 1L, 2L, 
    2L, 3L, 1L), .Label = c("boulder", "gothictown", "mm_maintalus", 
    "picnic_lower"), class = "factor"), date = structure(c(10L, 
    8L, 5L, 4L, 15L, 15L, 4L, 2L, 12L, 1L, 14L, 9L, 14L, 14L, 
    7L, 3L, 11L, 6L, 13L, 16L), .Label = c("11-Jun-17", "13-Jul-16", 
    "14-Jun-16", "15-Jun-16", "16-Jun-16", "18-Jun-18", "2-Jun-16", 
    "20-Jul-15", "21-May-18", "22-Jun-15", "22-May-18", "25-Jun-17", 
    "27-Jun-18", "28-May-18", "3-Jun-16", "9-Jul-15"), class = "factor"), 
    age_class = structure(c(1L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 1L, 
    2L, 2L, 2L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 2L), .Label = c("A", 
    "Y"), class = "factor")), row.names = c(NA, -20L), class = "data.frame")
Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
  • 1
    It would help to see some example data. I suspect the issue is related to `(1|uid)` which suggests you specified 1 group per observation (not what you want). – neilfws Feb 11 '20 at 00:37
  • Thank you for your response! I have attached some sample data to the original post. – ginajohnson Feb 11 '20 at 05:32
  • 2
    Data [in a plain text format please](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) not as images, so we can copy/paste it. – neilfws Feb 11 '20 at 09:21
  • I just added to the original post, hopefully this is what you were asking for! – ginajohnson Feb 11 '20 at 19:13

1 Answers1

2
  • It seems based on the data you've shown us that there is only one observation per uid value. If that's true then, as @neilfws says, your model is overspecified - when fitting a linear mixed model (see third bullet point below) or a GLMM with a family that takes an adjustable scale parameter (e.g. Gamma), an observation-level random effect will be confounded with the residual variance (for LMMs) or scale parameter. Just leave out the (1|uid) term out of your model.
  • I initially thought the most likely issue was using colony area as a random-effect grouping variable (i.e., on the right side of the bar in a random-effects specification); if colony area is a continuous variable, it may be a nuisance variable (i.e. something you want to control for statistically), but it (usually) can't sensibly be a random-effects grouping variable: this question explains the issue in more detail.
  • more stylistic/aesthetic than substantive, but if you are fitting a linear (rather than generalized linear) mixed model, i.e. the (conditional) response is normally distributed, then you should use lmer() rather than glmer() (if you call glmer() without specifying a family= argument, you get a warning that suggests you should use lmer() instead). (Typically, mass gain would be modeled as normal, or maybe log-Normal or Gamma ...)
Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
  • Thank you for your response! Colony area is a categorical variable. Would this change anything? I attached a photo of the data in the original post. – ginajohnson Feb 11 '20 at 06:01