4

So here's my problem. I have a data set in R that I need to run a mixed effects model on. Here's the code:

data <- read.csv("D:/blahblah.csv")
analysis.data <- lmer(intdiff ~ stress_limit * word_position * follows + (1|speaker), data)
summary(analysis.data)

When I try to run the script, it returns the following error:

 Error in mer_finalize(ans) : Downdated X'X is not positive definite, 15.

I have tracked the error down to the "follows" parameter because when I just use stress_limit and word_position, it runs fine. If it helps, the data in "follows" are only 3 strings: n or l, consonant, vowel. I've tried replacing the spaces with _ but with no success. Is there something about the internal workings of the lmer() function that is preventing the use of "follows" in this case? Any help would be great!

For more info: intdiff contains numeric values, stress_limit is strings (Stressed or Unstressed) and word position is also strings (Word Medial or Word Initial).

EDIT: Here is a data sample that reproduces the error:

structure(list(intdiff = c(11.45007951, 12.40144758, 13.47898367, 
6.279497762, 18.19461897, 16.15539707), word_position = structure(c(2L, 
2L, 2L, 1L, 1L, 1L), .Label = c("Word Initial", "Word Medial"
), class = "factor"), follows = structure(c(4L, 4L, 4L, 1L, 2L, 
4L), .Label = c("Consonant", "n or l", "Pause", "Vowel"), class = "factor"), 
stress_limit = structure(c(2L, 1L, 1L, 2L, 2L, 2L), .Label = c("Stressed", 
"Unstressed"), class = "factor"), speaker = structure(c(2L, 
2L, 2L, 2L, 2L, 2L), .Label = c("f11r", "f13r", "f15a", "f16a", 
"m09a", "m10a", "m12r", "m14r"), class = "factor")), .Names = c("intdiff", 
"word_position", "follows", "stress_limit", "speaker"), row.names = c(NA, 
6L), class = "data.frame")

I tried the lme() function as well, but it returned this error:

Error in MEEM(object, conLin, control$niterEM) : 
Singularity in backsolve at level 0, block 1

The code in my original post is the exact code I'm using, minus the library(lme4) call, so I'm not leaving any information out I can think of.

My R version is 2.15.2

Shakesbeery
  • 249
  • 1
  • 6
  • 18
  • How many rows has your actual data frame `data`? – Sven Hohenstein Feb 02 '13 at 07:25
  • The data frame has about 1110 rows. The data is predictable given a sample, though. – Shakesbeery Feb 02 '13 at 08:54
  • Do your predictor variables contain *all possible combinations* of stress limit, word position, and follows, or are some missing (because they are infeasible or you didn't happen to measure them)? Is `with(data,all(table(stress_limit,word_position,follows)>=1))` true? (This is turning into a stats question rather than a programming question ...) – Ben Bolker Feb 02 '13 at 21:15
  • The result is false, so I guess that indeed, not all possible combinations exist. Does this mean a mixed effects model cannot be run at all? Or can I take this into account when I run the function? – Shakesbeery Feb 02 '13 at 22:09

1 Answers1

11

Hard to tell for sure without a reproducible example: How to make a great R reproducible example?

But, guessing: these sorts of problems are generally due to collinearity in the design matrix. Centering your continuous predictor (intdiff) may help. You can also explore the design matrix directly

X <- model.matrix( ~ stress_limit * word_position * follows, data)

Collinearity between pairs: cor(X). Unfortunately I don't have a suggestion for detecting multi-collinearity (i.e. not between pairs, but between combinations of >2 predictors) off the top of my head, although you can look into the tools for computing variance inflation factors (e.g. library("sos"); findFn("VIF")).

As a cross-check, lme should also be able to handle your model:

library(nlme)
lme(intdiff ~ stress_limit * word_position * follows, 
   random=~1|speaker, data=data)

When I run your test data in the development version of lme4 (available on github) I get Error in lmer(intdiff ~ stress_limit * word_position * follows + (1 | : rank of X = 5 < ncol(X) = 12. On the other hand, with this small an input data set (6 observations), there's no possible way you could fit 12 parameters. It's a little harder to tell exactly where your problem is. Do all 12 combinations of your 3 variables actually occur in your data? If some are missing, then you need to follow the advice given in the development version's help:

Unlike some simpler modeling frameworks such as ‘lm’ and ‘glm’ which automatically detect perfectly collinear predictor variables, ‘[gn]lmer’ cannot handle design matrices of less than full rank. For example, in cases of models with interactions that have unobserved combinations of levels, it is up to the user to define a new variable (for example creating ‘ab’ within the data from the results of ‘droplevels(interaction(a,b))’).

In particular, you can fit this model as follows:

data <- transform(data,
       allcomb=interaction(stress_limit,word_position,follow,drop=TRUE))
lme(intdiff ~ allcomb, random=~1|speaker, data=data)

This will give you a one-way ANOVA treating the unique combinations of levels that are actually present in the data as the categories. You'll have to figure out for yourself what they mean.

The alternative is to reduce the number of interactions in the model until you get to a set that don't have any missing combinations; if you're lucky (stress_limit+word_position+follow)^2 (all two-way interactions) will work, but you might have to reduce the model still farther (e.g. stress_limit + word_position*follow).

Another way to test this is to use lm() on your proposed models and check that there are no NA values in the estimated coefficients.

The main thing you will be losing in these ways is convenience/ease of interpretation, because the parameters for the missing combinations couldn't have been estimated from the data anyway ...

Community
  • 1
  • 1
Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
  • 1
    First off, thank you so much for answering! I don't really understand the collinearity aspect of the problem, perhaps you could explain? I googled the term, but it didn't help to clarify much (Beginning statistician). Otherwise, I improved the available information above which maybe can help us work this out? – Shakesbeery Feb 02 '13 at 01:56
  • @Shakesbeery yep, that's the difficulty. The tools can only do what the data allow :-( . Reminds me of my first exposure to least-squares analysis. I cheerfully figured that (reducing undergrad physics lab data), even though the underlying model was y ~x^2, I could get a much "cooler" fit to an 8-th order polynomial. fail. This sort of stuff happens to all of us. – Carl Witthoft Feb 02 '13 at 14:43
  • Fantastic, this has been an extremely helpful and informative reply! I think I can take it from here. Again, you have my most sincere thanks. – Shakesbeery Feb 03 '13 at 00:04