15

I'm attempting to run a repeated-meaures ANOVA using R. I've gone through various examples on various websites, but they never seem to talk about the error that I'm encountering. I assume I'm misunderstanding something important.

The ANOVA I'm trying to run is on some data from an experiment using human participants. It has one DV and three IVs. All of the levels of all of the IVs are run on all participants, making it a three-way repeated-measures / within-subjects ANOVA.

The code I'm running in R is as follows:

aov.output = aov(DV~ IV1 * IV2 * IV3 + Error(PARTICIPANT_ID / (IV1 * IV2 * IV3)),
                 data=fulldata)

When I run this, I get the following warning:

Error() model is singular

Any ideas what I might be doing wrong?

Gavin Simpson
  • 170,508
  • 25
  • 396
  • 453
vize
  • 281
  • 1
  • 2
  • 10
  • 1
    A quick googling of this error (which is often a good tactic) led me to this page: http://tolstoy.newcastle.edu.au/R/help/04/10/5215.html The relevant part is here: I think that means the correct error model is Error(Subject/T.norm.Class): my guess is that WasSick is a subject-level observation and so each subject only has one level of it. Certainly that is the model which was fitted. - Professor Brian Ripley. /end quote. I suspect that you have specified an incorrect error distribution, but without more information it is hard to be sure – richiemorrisroe Apr 17 '11 at 16:46
  • 3
    looks like your random effects part is far to complex. Singular models often indicate that you've tried to fit a too complex model without sufficient data/observations. – Gavin Simpson Apr 17 '11 at 17:30
  • 4
    By the way, this Q is OT for this site - you would be better asking on http://stats.stackexchange.com – Gavin Simpson Apr 17 '11 at 17:31
  • @richiemorrisroe hmm, I had googled this, but managed to miss the link you are pointing to. For all participants, the IVs are given at all levels, so it's not the case that each subject has only one level of any of them. For this, there are 2 levels of IV1, 5 levels of IV2 and 2 levels of IV3. – vize Apr 17 '11 at 17:53
  • @Gavin Simpson what counts as 'too complex a model' ? Also, thanks for the tip for stats.stackexchange - I'll post there in future for questions like this. I had looked there, but that seemed to have more questions about pure stats and theory, rather than R and scripting/programming, which stackoverflow seems to cater to more. – vize Apr 17 '11 at 17:55
  • @vize i) SO is for _programming_ and the Crossvalidated site I suggested is certainly the place for questions about doing stats using software. ii) too complex is one where you have more terms than observations or approaching that limit. You are fitting a 3-way interaction in the fixed effects and something quite complex involving this 3-way interaction in the random effects. You need a minimum of 20 observations just to fit the fixed effects. Are you double accounting here - the same terms are in both the fixed and random effects? – Gavin Simpson Apr 17 '11 at 18:02
  • what is the output of with( fulldata, table( IV1, IV2, IV3, PARTICIPANT_ID) )? Each IV should be equal across participant and represent your expected levels of IV. – John Apr 17 '11 at 21:33

2 Answers2

15

Try using the lmer function in the lme4 package. The aov function is probably not appropriate here. Look for references from Dougles Bates, e.g. http://lme4.r-forge.r-project.org/book/Ch4.pdf (the other chapters are great too, but that is the repeated measures chapter, this is the intro: http://lme4.r-forge.r-project.org/book/Ch1.pdf). The R code is at the same place and for longitudinal data, it seems to be generally considered wrong these days to just fit OLS instead of a components of variance model like in the lme4 package, or in nlme, which to me seems to have been wildly overtaken by lme4 in popularity recently. You may note Brian Ripley's referenced post in the comments section above just recommends switching to lme also.

By the way, a huge advantage off the jump is you will be able to get estimates for the level of each effect as adjustments to the grand mean with the typical syntax:

lmer(DV ~ 1  +IV1*IV2*IV3 +(IV1*IV2*IV3|Subject), dataset))

Note your random effects will be vector valued.

David Arenburg
  • 91,361
  • 17
  • 137
  • 196
Patrick McCann
  • 484
  • 4
  • 11
  • Great, thanks! It seems to be working now. Thanks very much for the tip, lmers seem the way to go :) – vize Apr 18 '11 at 08:25
10

I know the answer has been chosen for this post. I still wish to point out how to specify a correct error term/random effect when fitting a aov or lmer model to a multi-way repeated-measures data. I assume that both independent variables (IVs) are fixed, and are crossed with each other and with subjects, meaning all subjects are exposed to all combinations of the IVs. I am going to use data taken from Kirk’s Experimental Deisign: Procedures for the Behavioral Sciences (2013).

library(lme4)
library(foreign)
library(lmerTest)
library(dplyr)

file_name <- "http://www.ats.ucla.edu/stat/stata/examples/kirk/rbf33.dta" #1
d <- read.dta(file_name) %>%                                              #2
  mutate(a_f = factor(a), b_f = factor(b), s_f = factor(s))               #3

head(d)
    ##   a b s  y a_f b_f s_f
    ## 1 1 1 1 37   1   1   1
    ## 2 1 2 1 43   1   2   1
    ## 3 1 3 1 48   1   3   1
    ## 4 2 1 1 39   2   1   1
    ## 5 2 2 1 35   2   2   1

In this study 5 subjects (s) are exposed to 2 treatments - type of beat (a) and training duration (b) - with 3 levels each. The outcome variable is the attitude toward minority. In #3 I made a, b, and s into factor variables, a_f, b_f, and s_f. Let p and q be the numbers of levels for a_f and b_f (3 each), and n be the sample size (5).

In this example the degrees of freedom (dfs) for the tests of a_f, b_f, and their interaction should be p-1=2, q-1=2, and (p-1)*(q-1)=4, respectively. The df for the s_f error term is (n-1) = 4, and the df for the Within (s_f:a_f:b_f) error term is (n-1)(pq-1)=32. So the correct model(s) should give you these dfs.

Using aov

Now let’s try different model specifications using aov:

aov(y ~ a_f*b_f + Error(s_f), data=d) %>% summary()         # m1

aov(y ~ a_f*b_f + Error(s_f/a_f:b_f), data=d) %>% summary() # m2

aov(y ~ a_f*b_f + Error(s_f/a_f*b_f), data=d) %>% summary() # m3

Simply specifying the error as Error(s_f) in m1 gives you the correct dfs and F-ratios matching the values in the book. m2 also gives the same value as m1, but also the infamous “Warning: Error() model is singular”. m3 is doing something strange. It is further partitioning Within residuals in m1 (634.9) into residuals for three error terms: s_f:a_f (174.2), s_f:b_f (173.6), and s_f:a_f:b_f (287.1). This is wrong, since we would not get three error terms when we run a 2-way between-subjects ANOVA! Multiple error terms are also contrary to the point of using block factorial designs, which allows us to use the same error term for the test of A, B, and AB, unlike split-plot designs which requires 2 error terms.

Using lmer

How can we get the same dfs and F-values using lmer? If your data is balanced, the Kenward-Roger approximation used in the lmerTest will give you exact dfs and F-ratio.

lmer(y ~ a_f*b_f + (1|s_f), data=d) %>% anova()         # mem1

lmer(y ~ a_f*b_f + (1|s_f/a_f:b_f), data=d) %>% anova() # mem2

lmer(y ~ a_f*b_f + (1|s_f/a_f*b_f), data=d) %>% anova() # mem3

lmer(y ~ a_f*b_f + (1|s_f:a_f:b_f), data=d) %>% anova() # mem4

lmer(y ~ a_f*b_f + (a_f*b_f|s_f), data=d) %>% anova()   # mem5

Again simply specifying the random effect as (1|s_f) give you the correct dfs and F-ratios (mem1). mem2-5 did not even give results, presumably the numbers of random effects it needed to estimate were greater than the sample size.

Masato Nakazawa
  • 970
  • 6
  • 11