R, extracting p-value for each row from t.test

Question

I'm trying to conduct a t.test for each row and then extract the p-values for plotting. As a reference, I found this old post:output p value from a t-test in R

Here is my snippet:

> pVal143<-apply(mRNA143.data, 1, t.test)$p.value

But when I try to call it I only return "NULL"? Below is a few rows of my data just as a reference, thanks.

       c.mRNA    h.mRNA
1    8.224342  8.520142
2    9.096665 11.762597
3   10.698863 10.815275
4   10.666233 10.972130
5   12.043525 12.140297

UPDATE with original dataset "c007" (I need to compare the p-values from the "C" values and H values).

                                        C1       C2      C3     C4       C5     C6     H1    H2 H3  H4  H5  H6
NP_000005   P01023  Protein Name    8.57345 8.45938 8.68941 8.35913 8.48177 8.44560 8.40986 8.59392 8.46562 8.07999 8.22759 8.41817
NP_000010   P24752  Protein Name    8.32595 8.19273 8.10708 8.48156 7.99014 8.24859 8.78216 8.59592 8.48299 8.52647 8.34797 8.38534

Please add a sample of `mRNA143.data` to your question to make your problem reproducible. Preferably using `dput(mRNA143.data)`. — markus, Aug 19 '18 at 17:55
You've asked a similar question [here](https://stackoverflow.com/questions/51913634/t-test-for-genes-using-apply-function-in-dataframe). _t_-test with only one observation per group (which I am assuming you're trying to do in this case) will probably not fare well (see _t_-test on Wikipedia why theoretically this is a no-no). You are basically trying to do is `t.test(x = 8.224342, y = 8.520142)`. — Roman Luštrik, Aug 19 '18 at 18:02
Above is the sample for mRNA143.data., two columns, 143 rows (I've only included 5). — Oars, Aug 19 '18 at 18:05
It doesn't matter how many rows you have, because you're trying to get a p-value FOR EACH row and this is not possible. t-test needs to "understand" the distribution of your data and a 2 group comparison with 1 point each doesn't make sense. What MAKES SENSE is comparing a group of rows, or comparing your 2 columns. The link you provided uses `t.test(1:10, 7:20)` which compares a group with 10 values (1:10) vs. a group with 14 values (7:20). — AntoniosK, Aug 19 '18 at 18:19
Like @RomanLuštrik said, you need more observations. Tip: `t.test` can do it with length(x) == length(y) == 2. — Rui Barradas, Aug 19 '18 at 18:20
The columns in my example are the rowMeans that I already calculated. I could go back to the original dataset and try it for columns 4:9 and 10:15? — Oars, Aug 19 '18 at 18:27
Now that you showed us your original dataset it makes sense... — AntoniosK, Aug 19 '18 at 18:53
So since I have 143 rows, would I just use `t.test(c007 [, 4:9], c007 [, 10:15])$p.value` for a p.value for each row? — Oars, Aug 19 '18 at 19:07
No, it's not going to work and you can test it to see what you get. You should expect one comparison, and therefore one p-value, because `t.test` will not work in vectorised way. It will combine all values (i.e. rows) you have in columns 4:9 to form group A and all values you have in columns 10:15 to form group B. You should be able to use my solution and get what you want. — AntoniosK, Aug 19 '18 at 19:18
Just in case someone finds it helpful, I've found (but never used so far) a `t.test` alternative from package `BSDA`, called `tsum.test`. When `t.test` requires the value of each observation of each group, `tsum.test` requires summary information (like means, st dev and sample size) for each group. Seems a very good alternative when dealing with huge datasets. — AntoniosK, Aug 20 '18 at 16:31

score 1 · Answer 1 · answered Aug 19 '18 at 18:55

1

One solution where you have to manually specify which columns belong to each group for comparison:

# example data
df = read.table(text = "
C1       C2      C3     C4       C5     C6     H1    H2 H3  H4  H5  H6
8.57345 8.45938 8.68941 8.35913 8.48177 8.44560 8.40986 8.59392 8.46562 8.07999 8.22759 8.41817
8.32595 8.19273 8.10708 8.48156 7.99014 8.24859 8.78216 8.59592 8.48299 8.52647 8.34797 8.38534
", header=T)

library(tidyverse)

df %>%
  rowwise() %>%
  mutate(pval = t.test(c(C1,C2,C3,C4,C5,C6),
                       c(H1,H2,H3,H4,H5,H6))$p.value) %>%
  ungroup()

# # A tibble: 2 x 13
#      C1    C2    C3    C4    C5    C6    H1    H2    H3    H4    H5    H6   pval
#   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl>
# 1  8.57  8.46  8.69  8.36  8.48  8.45  8.41  8.59  8.47  8.08  8.23  8.42 0.161 
# 2  8.33  8.19  8.11  8.48  7.99  8.25  8.78  8.60  8.48  8.53  8.35  8.39 0.0110

An alternative solution where you reshape your data and your 2 groups are created from the first letter of each column:

df %>%
  mutate(id = row_number()) %>%                 # add row id
  gather(key, value, -id) %>%                   # reshape dataset
  mutate(key = substr(key,1,1)) %>%             # create a group column from first letter (will be used for the t.test comparison)
  group_by(id) %>%                              # for each row
  summarise(pval = t.test(value ~ key)$p.value) # get p value 

# # A tibble: 2 x 2
#      id   pval
#   <int>  <dbl>
# 1     1 0.161 
# 2     2 0.0110

answered Aug 19 '18 at 18:55

AntoniosK

15,991
2
19
32

I was able to execute your code - many thanks: `> pvals.007<-x007 %>% + rowwise() %>% + mutate(pval = t.test(c(C1,C2,C3,C4,C5,C6), + c(H1,H2,H3,H4,H5,H6))$p.value) %>% + ungroup() > plot(pvals.007)`. When I plot it I was expecting a scatter plot of p-values for each row (comparing C1-6 and H1-6), but instead I'm getting 143 little vertical marks? I also cannot create a histogram as I'm getting an Error: 'x" must be numeric? – Oars Aug 19 '18 at 20:03
`pvals.007` is a dataframe and not a vector of (143) p-values. Try to use `pvals.007$pval`, as this is the column where all p-values are stored. – AntoniosK Aug 19 '18 at 20:11
1

Many thanks - I really appreciate it! I'd mark your response as "answered" but I don't have the reputation points. Tough crowd :-) – Oars Aug 19 '18 at 20:15
Don't worry about that. The point is to learn something that you can use in the future in a similar case :) – AntoniosK Aug 19 '18 at 20:18
I'm also trying to run a version with the t.test variances = FALSE, for the mutate step in your code, I've attempted the following but get an error: `mutate(pval = t.test(var.equal = FALSE(c(C1,C2,C3,C4,C5,C6),` – Oars Aug 20 '18 at 09:58
`t.test` by default assumes `var.equal = FALSE`. You can try `var.equal = TRUE` if you want like this: `mutate(pval = t.test(c(C1,C2,C3,C4,C5,C6), c(H1,H2,H3,H4,H5,H6), var.equal = TRUE)$p.value)` – AntoniosK Aug 20 '18 at 10:23
You're right, I guess if I want to run a t.test with the assumption of unequal variances then the default setting is all I need. BTW, thanks for helping me shift the va.equal statement to the end of the argument. – Oars Aug 20 '18 at 10:52
It works in the beginning as well, but seems you have used a `(` instead of a `,` in your code :) – AntoniosK Aug 20 '18 at 10:55
I tried to replace `t.test` with `anova` by using the same snippet but get an error? – Oars Aug 20 '18 at 23:19
I'd recommend you post it as a different question with a specific example. – AntoniosK Aug 20 '18 at 23:20

R, extracting p-value for each row from t.test

1 Answers1

Linked