0

I had a model of the form mod = lmer(y ~ A + B + (1|line))

  • y is the continuous response variable, about 2000 rows
  • A and B are fixed effects with 2 levels (non present / present)
  • line is a random effect with about 100 levels

In order to predict the response variable, all I had to do is construct a New_Data data frame (100 x 3) with the first column having all lines, and 2 and 3 zeros

Then I hit adjusted = predict(mod, New_Data). Worked fine.

Now due to a different experimental design, I have an additional random effect, that line is nested within. The model becomes:

mod = lmer(y ~ A + B + (1|batch) + (1|batch:line))

again I constructed a New_Data data frame, now having 4 columns, column 4 having batch at level 1 (out of 4 possible levels)

but now when trying to predict, the predict function tells me:

Error in levelfun(r, n, allow.new.levels = allow.new.levels) : new levels detected in newdata

What am I doing wrong?

Minimal Example

library(lme4)
library(data.table)

lines <- factor(c('line.c', 'line.c', 'line.b', 'line.b', 'line.a', 'line.a',
                  'line.d', 'line.d', 'line.e', 'line.e'))

Model_Data <- data.table(y = rnorm(10),
                         A = factor(c('a', 'a', 'b', 'b', 'b', 'a', 'a', 'c', 'c', 'c')),
                         B = lines,
                         C = factor(c(rep(1, 4), rep(2, 4), 3, 3)))

My_Model <- Model_Data[, lmer(y ~ A + (1|C) + (1|C:B))]


Prediction_Data <- data.table(B = levels(lines),
                              A = factor('a', levels=c('a', 'b', 'c')),
                              C = factor(1, levels=c(1,2,3)))

y.adjusted <- predict(My_Model, Prediction_Data)
Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
  • Hi ! It would be better if you checkout [POST Format](https://stackoverflow.com/help/formatting) for future endeavor at Stack overflow. -Thank you – Momin Jan 17 '18 at 02:57
  • Sounds like there are new levels in the data in which you are trying to predict into. It would be easier to diagnose and suggest possible solutions if you could provide a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input data so we could run and test the code. – MrFlick Jan 17 '18 at 04:44
  • @Momin Gotcha! MrFlick I'll provide one as soon as I have time (should be during the weekend). – Michael Beyeler Jan 17 '18 at 15:01
  • Minimal example added above. :) – Michael Beyeler Jan 17 '18 at 15:46
  • (1) `data.table` seems to use `stringsAsFactors=FALSE` by default, so you might have to explicitly use `factor()` when defining the `B` element. But that still doesn't seem to solve the problem. (2) beyond the error message, can you say what the adjusted output is and how it differs from what you expected? – Ben Bolker Jan 17 '18 at 16:12
  • In my real dataset I have around 100 different lines, and for each of the 100 lines there are 10 individual measurements. C is a batch effect - each line corresponds to only one batch. Now what's strange, is that the predicted value for about 90% of the lines become exactly the same (with a value of around the overall mean of y), and only 10% seem to have adequate adjusted values. – Michael Beyeler Jan 17 '18 at 16:17
  • I think the problem is the interaction term `C:B`. If you compare `with(Model_Data, table(C,B))` to `with(Prediction_Data, table(C,B))` you'll see that you are trying to predict in groups where no such combination existed in the model. In the model fit it says `"Number of obs: 10, groups: C:B, 5; C, 3"` so there are 5 different groups from the C:B combinations. But your test data has combinations like `B=line.a`, `C=1`. – MrFlick Jan 17 '18 at 18:40
  • Yes exactly, that's true, and it must be that way because I would like to predict the output for all lines in batch 1. So, what would I have to do to implement these combinations adequately? – Michael Beyeler Jan 17 '18 at 18:45
  • Basically, what I'd like to do, is to correct for a batch effect. – Michael Beyeler Jan 17 '18 at 19:30
  • 1
    Well, if you want the linear mixed model you specified, you'd need training data for every group you want make predictions for. If you want to make predictions with random errors for groups that you never observe, then this standard lmer machinery isn't going to work for you. You'd need to make some kine of additional modeling assumptions about the unobserved values. If you need help choosing a statistical model for your data, you should ask your question over at [stats.se]. – MrFlick Jan 17 '18 at 21:00
  • Alright, that sounds reasonable. Thanks for all your help. :) – Michael Beyeler Jan 17 '18 at 21:35

0 Answers0