0

I'm a beginner in r and I need some help. I've managed to follow instructions from a simple, friendly tutorial to do the 2-way ANOVA for repeated measure analysis (Group: 3TW or 5TW and 12 time points). This is my entire code:

RAWdata$Animal <-as.factor(RAWdata$Animal)
head(RAWdata)
RAWdata <- RAWdata %>%
  gather(key = "time", value = "score", T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T11, T12) %>%
  convert_as_factor(Animal, time)

set.seed(123)
RAWdata %>% sample_n_by(Group, time, size = 1)
RAWdata %>%
  group_by(Group, time) %>%
  get_summary_stats(score, type = "mean_sd") 
RAWdata %>%
  group_by(Group, time) %>%
  identify_outliers(score)
Q1 <- quantile(RAWdata$score, .25)
Q3 <- quantile(RAWdata$score, .75)
IQR <- IQR(RAWdata$score)
data_no_OL <- subset(RAWdata, RAWdata$score > (Q1 - 1.5*IQR) & RAWdata$score < (Q3 + 1.5*IQR))
dim(RAWdata)
dim(data_no_OL)
data_no_OL %>%
  group_by(Group, time) %>%
  get_summary_stats(score, type = "mean_sd") 
data_no_OL %>%
  group_by(Group, time) %>%
  shapiro_test(score)
ggqqplot(data_no_OL, "score", ggtheme = theme_bw()) +
  facet_grid(time ~ Group, labeller = "label_both")
BSizeanova.aov <- anova_test(
  data = data_no_OL, dv = score, wid = Animal,
  within = time, between = Group
)
get_anova_table(BSizeanova.aov)
timeXGroup <- data_no_OL %>%
  group_by(time) %>%
  anova_test(dv = score, wid = Animal, between = Group) %>%
  get_anova_table() %>%
  adjust_pvalue(method = "bonferroni")
timeXGroup
PWCtimeXGroup <- data_no_OL %>%
  group_by(time) %>%
  pairwise_t_test(
    score ~ Group,
    p.adjust.method = "bonferroni"
  )
PWCtimeXGroup
  

Everything works until I try to run the next command:

ggplot(data_no_OL, aes(x = factor(time), y = score, fill = Group)) + 
  geom_bar(stat = "identity", position = "dodge", alpha = 0.5, colour = "gray25")  +
  geom_errorbar(aes(ymin=mean-sd, ymax=mean+sd), position = position_dodge(0.9), width = 0.25,
                show.legend = FALSE, colour = "gray25") +
  labs(x="Timepoint (˚C)", y="Binge-size") +
  theme_bw() + 
  theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank()) +
  theme(legend.position = c(0.1, 0.75)) +
  geom_text(aes(label=Tukey), position = position_dodge(0.90), size = 3, 
            vjust=-0.8, hjust=-0.5, colour = "gray25") +
  ylim(0, 1500) +
  scale_fill_grey()

This is the error:

Error in mean - sd : non-numeric argument to binary operator

This is my data:

> dput(RAWdata[1:22, ])
structure(list(Animal = structure(c(1L, 2L, 13L, 14L, 15L, 16L, 
17L, 18L, 19L, 20L, 21L, 22L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 
11L, 12L), levels = c("C1/B", "C1/N", "C13/B", "C13/N", "C14/B", 
"C14/N", "C16/B", "C16/N", "C17/B", "C17/N", "C18/B", "C18/N", 
"C2/B", "C2/N", "C3/B", "C3/N", "C4/B", "C4/N", "C5/B", "C5/N", 
"C6/B", "C6/N"), class = "factor"), group = c("3TW", "3TW", "3TW", 
"3TW", "3TW", "3TW", "3TW", "3TW", "3TW", "3TW", "3TW", "3TW", 
"5TW", "5TW", "5TW", "5TW", "5TW", "5TW", "5TW", "5TW", "5TW", 
"5TW"), time = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), levels = c("T1", 
"T10", "T11", "T12", "T2", "T3", "T4", "T5", "T6", "T7", "T8", 
"T9"), class = "factor"), score = c(0.06778, 0.05257, 0.06126, 
0.11417, 0.06098, 0.07841, 0.05312, 0.07186, 0.18092, 0.01574, 
0.08187, 0.0335, 0.02649, 0.09314, 0.05225, 0.04965, 0.0462, 
0.07008, 0.10587, 0.03933, 0.0206, 0.08313)), row.names = c(NA, 
22L), class = "data.frame")

I tried to solve it myself but it just keep repeating itself. Am I doing this right? I'll appreciate any advise you can give me!

  • Welcome to StackOverflow. The error may be because your `data_no_OL` does not contain columns named `mean` and/or `sd`. However, we need further info to help you. The best thing would be to make your data available and post the entire code so that your error is reproducible for us. Here is a guide on how to create an optimal reproducible example for others: https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – Paul Schmidt Sep 14 '22 at 06:41
  • Thank you so much! I hope I did it right - I typed dput(RAWdata, file = "BS.norm.F0.csv"). I also typed dput(RAWdata[1:22, ]). I took a screen shot of the output, which you can see here: https://i.postimg.cc/sgS0205t/dput.png . – אלין קצ'וקי Sep 14 '22 at 07:35
  • You are almost there! The `dput(RAWdata[1:22, ])` is fine, but I cannot work with the screenshot. You would actually need to copy the output from the console. Then you edit your question and paste this output code there. Once it is there, I can copy your data and try to fix your problem. Besides this, I have one more point: Do you have more code than just the `ggplot()` part you posted above? Like some code including an `anova()` command? Your data is called `RAWdata` but later it is `data_no_OL`. Please post everything necessary to reproduce your error. – Paul Schmidt Sep 15 '22 at 06:50
  • Thank you for your patience. I edited the question, and if there is anything else I need to add just let me know. – אלין קצ'וקי Sep 15 '22 at 14:27

1 Answers1

0

Ok, so I did manage to resolve the error, but I'm afraid this is not the full solution to what you want. Please find my comments below the reprex:

library(rstatix)
library(tidyverse)

RAWdata <- structure(list(
  Animal = structure(
    c(1L, 2L, 13L, 14L, 15L, 16L, 17L, 18L, 
      19L, 20L, 21L, 22L, 3L, 4L, 5L, 6L, 7L, 
      8L, 9L, 10L, 11L, 12L), 
    levels = c("C1/B", "C1/N", "C13/B", "C13/N", "C14/B", 
               "C14/N", "C16/B", "C16/N", "C17/B", "C17/N", 
               "C18/B", "C18/N", "C2/B", "C2/N", "C3/B", 
               "C3/N", "C4/B", "C4/N", "C5/B", "C5/N", 
               "C6/B", "C6/N"), class = "factor"), 
  Group = c("3TW", "3TW", "3TW", "3TW", "3TW", "3TW", "3TW", 
            "3TW", "3TW", "3TW", "3TW", "3TW", "5TW", "5TW", 
            "5TW", "5TW", "5TW", "5TW", "5TW", "5TW", "5TW", "5TW"), 
  time = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
                     1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), 
                   levels = c("T1", "T10", "T11", "T12", "T2", "T3", 
                              "T4", "T5", "T6", "T7", "T8", "T9"), 
                   class = "factor"), 
  score = c(0.06778, 0.05257, 0.06126, 0.11417, 0.06098, 0.07841, 
            0.05312, 0.07186, 0.18092, 0.01574, 0.08187, 0.0335, 
            0.02649, 0.09314, 0.05225, 0.04965, 0.0462, 0.07008, 
            0.10587, 0.03933, 0.0206, 0.08313)), 
  row.names = c(NA, 22L), class = "data.frame")


# based on your approach --------------------------------------------------
myMeanSD <- RAWdata %>%
  group_by(Group, time) %>%
  get_summary_stats(score, type = "mean_sd") 


ggplot(myMeanSD, aes(x = factor(time), y = mean, fill = Group)) +
  geom_bar(
    stat = "identity",
    position = "dodge",
    alpha = 0.5,
    colour = "gray25"
  )  +
  geom_errorbar(
    aes(ymin = mean - sd, ymax = mean + sd),
    position = position_dodge(0.9),
    width = 0.25,
    show.legend = FALSE,
    colour = "gray25"
  ) +
  labs(x = "Timepoint (˚C)", y = "Binge-size") +
  theme_bw() +
  theme(panel.grid.major = element_blank(),
        panel.grid.minor = element_blank()) +
  theme(legend.position = c(0.1, 0.75)) +
  # geom_text(
  #   aes(label = Tukey),
  #   position = position_dodge(0.90),
  #   size = 3,
  #   vjust = -0.8,
  #   hjust = -0.5,
  #   colour = "gray25"
  # ) +
  scale_y_continuous(
    limits = c(0, NA),
    expand = expansion(mult = c(0, 0.1))
  ) +
  scale_fill_grey()

Created on 2022-09-16 with reprex v2.0.2

Comments:

  1. I suspect that you provided only a subset of the data since the column time contains only one entry (=1). This is not really a problem here, but it makes the within = time you used unnecessary. Furthermore, it is no longer representative for a "repeated measures" ANOVA because measures are only present for one point of time and are thus not repeated. But as long as you are aware of this, we can still fix your error.
  2. The actual issue causing your error was simply that the data you provided (=data_no_OL) does not have columns named mean and sd. However, in the geom_errorbar(aes(ymin=mean-sd, ymax=mean+sd)...) you explicitly call for such columns. I noticed that you actually did compute mean and sd in your code, but you did not save the output into an object. Thus, I did just that - I copied that part of your code and saved the results in myMeanSD and then used this (and not data_no_OL) as the data the ggplot is based on. As you can see, this gets us a big part of the plot you tried to create: bars with the height of means and errorbars as mean+-sd.
  3. However, I did comment out the geom_text() which I think is supposed to add the results of a Tukey test that compares the means. Maybe it is supposed to be in the form of a compact letter display? Either way, I do not think you have actually done a Tukey test anywhere in your code and there is no column called Tukey anywhere, as far as I can tell. Thus, this part of the ggplot is still missing.
  4. I do not actually understand why you call this 2-way ANOVA. As far as I can tell you only compare one factor: Group?

At this point, I would like to point you to other resources that deal with the same statistical issue, but use other R functions.

  • Pleas see this chapter on compact letter displays for more details on the pros and cons of a compact letter display. I think that this is what you wanted. Maybe even scroll down there to the last alternative plot.
  • The topic of compact letter displays is a bit more complex when you have two factors and especially interactions between them. Check out this answer for details on what your options are.
  • Finally, here is a chapter on repeated measures.
Paul Schmidt
  • 1,072
  • 10
  • 23