R: p-value for each row from anova & lm()

Question

I'm trying to conduct an anova for each row and then extract the p-values for plotting. As a reference, I'm trying to adapt code from this post: R, extracting p-value for each row from t.test

Here is my snippet:

> anova.007.mRNA<-x007 %>%
+ rowwise() %>%
+ mutate(pval = anova(c(C1,C2,C3,C4,C5,C6),
+ c(H1,H2,H3,H4,H5,H6))$p.value) %>%
+ ungroup()

...but I get an error?

Error in mutate_impl(.data, dots) : 
  Evaluation error: no applicable method for 'anova' applied to an object of class "c('double', 'numeric')".

This is odd since I thought the anova test would be applied in a similar fashion? maybe I needed to create a linear model lm() first?

score 1 · Answer 1 · answered Aug 21 '18 at 00:44

# example data
df = read.table(text = "
C1       C2      C3     C4       C5     C6     H1    H2 H3  H4  H5  H6
8.57345 8.45938 8.68941 8.35913 8.48177 8.44560 8.40986 8.59392 8.46562 8.07999 8.22759 8.41817
8.32595 8.19273 8.10708 8.48156 7.99014 8.24859 8.78216 8.59592 8.48299 8.52647 8.34797 8.38534
", header=T)

library(tidyverse)

df %>%
  rowwise() %>%
  mutate(pval = anova(lm(c(C1,C2,C3,C4,C5,C6,
                           H1,H2,H3,H4,H5,H6) ~ c(rep("C",6),rep("H",6))))$`Pr(>F)`[1]) %>%
  ungroup()

# # A tibble: 2 x 13
#      C1    C2    C3    C4    C5    C6    H1    H2    H3    H4    H5    H6   pval
#   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl>
# 1  8.57  8.46  8.69  8.36  8.48  8.45  8.41  8.59  8.47  8.08  8.23  8.42 0.155 
# 2  8.33  8.19  8.11  8.48  7.99  8.25  8.78  8.60  8.48  8.53  8.35  8.39 0.0109

anova needs as input a model (lm) object and not 2 vectors. You need to create your model by combining all C and H values together and creating (manually here) the two groups as an independent variable.

Also, the way to extract the p-value here is not as simple as in t.test.

I noticed the `(rep("C"...` in the 4th row of the snippet - what does that represent? Thank you very much - this is extremely helpful! — Oars, Aug 21 '18 at 01:04
It creates a vector where C is repeated 6 times and then H 6 times. — AntoniosK, Aug 21 '18 at 07:22
Thanks again. I'm also trying to get a count of p-values lower the p=0.01, I've tried this: `pv007<-pvals.007.mRNA <=0.01` but get a warning message: `In Ops.factor(left, right) : ‘<=’ not meaningful for factors` — Oars, Aug 21 '18 at 20:04
I've found a solution to get the tally/count of the plots; however, tibble truncates the results so you only see the top ten rows, bummer! `> pvals.007.mRNA %>% + group_by(pval) %>% + count(pval<0.01)` — Oars, Aug 21 '18 at 21:00

R: p-value for each row from anova & lm()

1 Answers1