3

I am trying to follow the tutorial by Datanovia for Two-way repeated measures ANOVA.

A quick overview of my dataset:

I have measured the number of different bacterial species in 12 samplingsunits over time. I have 16 time points and 2 groups. I have organised my data as a tibble called "richness";

# A tibble: 190 x 4
   id    selection.group Day   value
   <fct> <fct>           <fct> <dbl>
 1 KRH1  KR              2      111.
 2 KRH2  KR              2      141.
 3 KRH3  KR              2      110.
 4 KRH1  KR              4      126 
 5 KRH2  KR              4      144 
 6 KRH3  KR              4      135.
 7 KRH1  KR              6      115.
 8 KRH2  KR              6      113.
 9 KRH3  KR              6      107.
10 KRH1  KR              8      119.

The id refers to each sampling unit, and the selection group is of two factors (KR and RK).

richness <- tibble(
  id = factor(c("KRH1", "KRH3", "KRH2", "RKH2", "RKH1", "RKH3")), 
  selection.group = factor(c("KR", "KR", "KR", "RK", "RK", "RK")), 
  Day = factor(c(2,2,4,2,4,4)), 
  value = c(111, 110, 144,  92,  85,  69))  # subset of original data

My tibble appears to be in an identical format as the one in the tutorial;

> str(selfesteem2)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame':   72 obs. of  4 variables:
 $ id       : Factor w/ 12 levels "1","2","3","4",..: 1 2 3 4 5 6 7 8 9 10 ...
 $ treatment: Factor w/ 2 levels "ctr","Diet": 1 1 1 1 1 1 1 1 1 1 ...
 $ time     : Factor w/ 3 levels "t1","t2","t3": 1 1 1 1 1 1 1 1 1 1 ...
 $ score    : num  83 97 93 92 77 72 92 92 95 92 ..

Before I can run the repeated measures ANOVA I must check for normality in my data. I copied the framework proposed in the tutorial.

#my code
richness %>%
  group_by(selection.group, Day) %>%
  shapiro_test(value)

#tutorial code
selfesteem2 %>%
  group_by(treatment, time) %>%
  shapiro_test(score)

But get the error message "Error: Column variable is unknown" when I try to run the code. Does anyone know why this happens?

I tried to continue without insurance that my data is normally distributed and tried to run the ANOVA

res.aov <- rstatix::anova_test(
  data = richness, dv = value, wid = id,
  within = c(selection.group, Day)
  )

But get this error message; Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) : 0 (non-NA) cases

I have checked for NA values with any(is.na(richness)) which returns FALSE. I have also checked table(richness$selection.group, richness$Day) to be sure my setup is correct


     2 4 6 8 12 16 20 24 28 29 30 32 36 40 44 50
  KR 6 6 6 6  6  6  6  6  6  6  6  5  6  6  6  6
  RK 6 6 6 6  6  5  6  6  6  6  6  6  6  6  6  6

And the setup appears correct. I would be very grateful for tips on solving this.

Best regards Madeleine

Below is a subset of my dataset in a reproducible format:

library(tidyverse)
library(rstatix)
library(tibble)

richness_subset = data.frame(
  id = c("KRH1", "KRH3", "KRH2", "RKH2", "RKH1", "RKH3"), 
  selection.group = c("KR", "KR", "KR", "RK", "RK", "RK"), 
  Day = c(2,2,4,2,4,4), 
  value = c(111, 110, 144,  92,  85,  69))

richness_subset$Day = factor(richness$Day)
richness_subset$selection.group = factor(richness$selection.group)
richness_subset$id = factor(richness$id)

richness_subset = tibble::as_tibble(richness_subset)

richness_subset %>%
  group_by(selection.group, Day) %>%
  shapiro_test(value)

# gives Error: Column `variable` is unknown
res.aov <- rstatix::anova_test(
  data = richness, dv = value, wid = id,
  within = c(selection.group, Day)
)

# gives Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) : 
#  0 (non-NA) cases
Maddie
  • 61
  • 1
  • 5
  • 1
    Please share your data in a [reproducible format](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). Viewing the `print()` or `str()` doesn't make it easy to copy/paste the data to test your code. – MrFlick Feb 03 '20 at 20:58
  • Error: Column variable is unknown -> issue might occur due to variable name being mistyped is it possible that you forgot to put a letter in caps? in group_by(selection.group, Day) %>% I for instance see that Day has a capital D – Jeroen Feb 03 '20 at 21:01
  • Looking at the source for `shapiro_test`, my guess was that you had a group of observations that were all NA, or a single row. Looking at the tests you've done, that doesn't seem to be the case. Without your full data to test, it's hard to help. If you don't want to share it, you can always try to anonymize it with something like `richness$value <- richness$value + rnorm(190)`. – Brian Feb 03 '20 at 21:44
  • 2
    @Brian, I don't think it is the issue here. I was assuming the same thing and also that has there is multiple groups maybe one group was too small to fit condition for the `shapiro.test`. But after running a similar example and excluding group below 3, I get the same issue. The error code seems to be related to `dplyr` more than to the `shapiro_test`. But I don't have yet pinpoint the error. – dc37 Feb 03 '20 at 21:57

3 Answers3

3

I create something like the design of your data:

set.seed(111)
richness = data.frame(id=rep(c("KRH1","KRH2","KRH3"),6),
selection.group=rep(c("KR","RK"),each=9),
Day=rep(c(2,4,6),each=3,times=2),value=rpois(18,100))

richness$Day = factor(richness$Day)
richness$id = factor(richness$id)

First, shapiro_test, there's a bug in the script and the value you wanna test cannot be named "value":

# gives error Error: Column `variable` is unknown
richness %>% shapiro_test(value)

#works
richness %>% mutate(X = value) %>% shapiro_test(X)
# A tibble: 1 x 3
  variable statistic     p
  <chr>        <dbl> <dbl>
1 X            0.950 0.422
1 X            0.963 0.843

Second, for the anova, this works for me.

rstatix::anova_test(
  data = richness, dv = value, wid = id,
  within = c(selection.group, Day)
  )

In my example every term can be estimated.. What I suspect is that one of your terms is a linear combination of the other. Using my example,

set.seed(111)
richness =
data.frame(id=rep(c("KRH1","KRH2","KRH3","KRH4","KRH5","KRH6"),3),
selection.group=rep(c("KR","RK"),each=9),
Day=rep(c(2,4,6),each=3,times=2),value=rpois(18,100))

richness$Day = factor(richness$Day)
richness$id = factor(richness$id)

rstatix::anova_test(
  data = richness, dv = value, wid = id,
  within = c(selection.group, Day)
  )

Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) : 
  0 (non-NA) cases

Gives the exact same error. This can be checked using:

lm(value~id+Day:selection.group,data=richness)


   Call:
lm(formula = value ~ id + Day:selection.group, data = richness)

Coefficients:
           (Intercept)                     id1                     id2  
               101.667                  -3.000                  -6.000  
                   id3                     id4                     id5  
                -6.000                   1.889                  11.556  
Day2:selection.groupKR  Day4:selection.groupKR  Day6:selection.groupKR  
                 1.667                 -12.000                   9.333  
Day2:selection.groupRK  Day4:selection.groupRK  Day6:selection.groupRK  
                -1.667                      NA                      NA 

The Day4:selection.groupRK and Day6:selection.groupRK are not estimateable because they are covered by a linear combination of factors before.

StupidWolf
  • 45,075
  • 17
  • 40
  • 72
  • I was looking at that hardcoded `variable` and `value` in the `shapiro_test` function definition and knew that was somehow the culprit, but couldn't figure out renaming the variable was the solution! – Brian Feb 04 '20 at 01:27
  • Changing the column name "value" to something else actually helped! Thank you so much! – Maddie Feb 04 '20 at 07:43
  • Cool! Glad it helped :) – StupidWolf Feb 04 '20 at 08:33
  • Could you explain a bit more why a linear combination of factors could prohibit the ANOVA from running? – Louie Lee Jul 01 '21 at 17:49
  • I followed your suggestions and I tried to simply delete the time point data that has the NA in the lm coefficients. However, when I rerun ANOVA, the new last time point now becomes the NA. I deleted again and the 2nd last time point become the NA. Is this normal? How to solve this issue? – Louie Lee Jul 01 '21 at 17:54
  • @LouieLee, in the above answer it means to check whether all the effects can be estimated. A linear combination of factors means your system is overdetermined, so it runs into a problem. – StupidWolf Jul 02 '21 at 13:15
  • I cannot comment more on your problem without seeing the data, if you have a new problem please post it as a new question with reproducible example @LouieLee – StupidWolf Jul 02 '21 at 13:16
  • @StupidWolf hi, this is something that is happening to me too. Could you just please how to properly create this ID variable – 12666727b9 Mar 20 '23 at 16:34
1

The solution for running the Shapiro_test proposed above worked.

And I figured out I have some linear combination by running lm(value~id+Day:selection.group,data=richness). However, I don't understand why? I know I have data points for each group (see graph). Where does this linear combination come from?

Repeated measure ANOVA appears so appropriate for me as I am following sampling units over time.

enter image description here

Maddie
  • 61
  • 1
  • 5
0

I had the same issue. Couldn't find out the solution. Finally the following works: install “ez” package

newModel<-ezANOVA(data = dataFrame, dv = .(outcome variable), wid = .(variable that identifies participants), within = .(repeated measures predictors), between = . (between-group predictors), detailed = FALSE, type = 2)

Example: bushModel<-ezANOVA(data = longBush, dv = .(Retch), wid = .(Participant), within = .(Animal), detailed = TRUE, type = 3)

Vrutang Shah
  • 45
  • 1
  • 7