2

I am trying to run a two-way repeated measures anova in R using the anova_test function in the rstatix package. I am roughly following the tutorial found here. My data consists of sevaral ant colonies ("Colony"), each split into 3 treatments ("Size"). I collected data ("g") over 8 timepoints ("Time"). I have uploaded a subset of my data on github, but here is a brief summary:

 # A tibble: 24 x 6
   Species Colony Fragment Size  Time      g
   <fct>   <fct>  <fct>    <fct> <fct> <dbl>
 1 obs     5      5L       L     1     0.565
 2 obs     2      2L       L     2     0.002
 3 obs     8      8L       L     3     0.699
 4 obs     12     12L      L     4     0.257
 5 obs     12     12L      L     5     0.131
 6 obs     3      3L       L     6     0.014
 7 obs     10     10L      L     7     0.15 
 8 obs     12     12L      L     8     0.054
 9 obs     10     10M      M     1     0.448
10 obs     8      8M       M     2     0.135
# ... with 14 more rows

I have tried running the two-way repeated measure anova three different ways, with the following code:

aov <- df %>% anova_test(g ~ Size*Time + Error(Colony/(Size*Time)))
aov <- df %>% anova_test(dv=g, wid = Colony, within= c(Size,Time))
aov <- anova_test(data = df, dv=g, wid=Colony, within=c(Size, Time))

They each output the following error:

Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) : 
  0 (non-NA) cases

I have tried the same code on two sample datasets that are formatted similarly to my dataset, and the function works perfectly (and each method outputs the same results). Here are summaries of the sample datasets for reference:

# A tibble: 6 x 4
  id    treatment time  score
  <fct> <fct>     <fct> <dbl>
1 7     ctr       t1       92
2 6     ctr       t2       65
3 12    ctr       t3       62
4 6     Diet      t1       76
5 9     Diet      t2       94
6 7     Diet      t3       87



# A tibble: 6 x 4
        len supp   dose    id
      <dbl> <fct> <dbl> <int>
    1  21.5 OJ      0.5     2
    2  14.5 OJ      1       9
    3  22.4 OJ      2       3
    4   4.2 VC      0.5     1
    5  17.3 VC      1       4
    6  29.5 VC      2      10

I have verified that my data does not have any NA values with any(is.na(df)) which returns FALSE.

I came across a similar question and one helpful poster suggested that this error might be due to a linear combination, rather than NA values. I decided to check my data using lm(g ~ Colony+Time:Size, data=df) and, indeed, it appears that I do have a linear combination:

Call:
lm(formula = g ~ Colony + Time:Size, data = df)

Coefficients:
(Intercept)      Colony1      Colony2      Colony3      Colony4      Colony5  Time1:SizeL  Time2:SizeL  Time3:SizeL  
   0.044167    -0.118549    -0.108424     0.076868     0.073243     0.034368     0.213000     0.351167     0.199833  
Time4:SizeL  Time5:SizeL  Time6:SizeL  Time7:SizeL  Time8:SizeL  Time1:SizeM  Time2:SizeM  Time3:SizeM  Time4:SizeM  
   0.060667     0.071333     0.005000     0.017000    -0.029167     0.239667     0.216333     0.174667     0.050500  
Time5:SizeM  Time6:SizeM  Time7:SizeM  Time8:SizeM  Time1:SizeS  Time2:SizeS  Time3:SizeS  Time4:SizeS  Time5:SizeS  
   0.069500     0.033167     0.011500    -0.003667    -0.015500     0.081167     0.020000     0.042500     0.026333  
Time6:SizeS  Time7:SizeS  Time8:SizeS  
  -0.014333    -0.000500           NA  

However, I do not understand why. The Time8:SizeS category is essentially the same as all of the other Time:Size combinations. If anyone can explain why I might be running into this error or has a solution for how I could carry out a two-way repeated measures anova (with or without anova_test) on my data, I would greatly appreciate it!

Thanks in advance!

StupidWolf
  • 45,075
  • 17
  • 40
  • 72
West9
  • 23
  • 5

1 Answers1

2

I need to read the code for rstatix::anova_test again, but your design is ok, it's balanced and what's causing all the problem is the extra columns. I suspect somewhere the pivoting goes haywire because of the columns:

library(rstatix)
library(dplyr)

df=read.csv("https://raw.githubusercontent.com/mwest9/sample_data/master/test_repeat_anova.csv")

df$Colony = factor(df$Colony)
df$Time = factor(df$Time)

df %>% select(g,Size,Time,Colony) %>%
anova_test(g ~ Size*Time + Error(Colony/(Size*Time)))

ANOVA Table (type III tests)

     Effect DFn DFd     F       p p<.05   ges
1      Size   2  10 4.098 0.05000       0.075
2      Time   7  35 5.428 0.00028     * 0.209
3 Size:Time  14  70 1.595 0.10200       0.099

Note it only reports the anova and not other test for sphericity:

Mauchly’s Test for Sphericity: If any within-Ss variables with more than 2 levels are present, a data frame containing the results of Mauchly’s test for Sphericity. Only reported for effects that have more than 2 levels because sphericity necessarily holds for effects with only 2 levels. • Sphericity Corrections: If any within-Ss variables are present, a data frame containing the Greenhouse-Geisser and Huynh-Feldt epsilon values, and corresponding corrected p-values.

StupidWolf
  • 45,075
  • 17
  • 40
  • 72
  • Thank you so much! It never even occurred to me that the extra columns would interfere. I will look into Mauchly's Test for Sphericity. Your help is much appreciated! – West9 Apr 23 '20 at 23:33
  • you're welcome :) you cannot do Mauchly i put it that comment because sometimes people wonder why it is not reported – StupidWolf Apr 23 '20 at 23:37