R: t-test over all columns

Question

I tried to do t-test to all columns (two at a time) of my data frame, and extract only the p-value. Here is what I have come up with:

for (i in c(5:525) ) {

t_test_p.value =sapply( Data[5:525], function(x) t.test(Data[,i],x, na.rm=TRUE)$p.value)

}

My questions are: 1. is there a way to do it without a loop? 2. how to capture the results of the t-test.

score 18 · Answer 1 · answered Mar 12 '12 at 07:45

I would recommend to convert your data frame to long format and use pairwise.t.test with appropriate p.adjust:

> library(reshape2)
> 
> df <- data.frame(a=runif(100),
+          b=runif(100),
+          c=runif(100)+0.5,
+          d=runif(100)+0.5,
+          e=runif(100)+1,
+          f=runif(100)+1)
> 
> d <- melt(df)
Using  as id variables
> 
> pairwise.t.test(d$value, d$variable, p.adjust = "none")

    Pairwise comparisons using t tests with pooled SD 

data:  d$value and d$variable 

  a      b      c      d      e   
b 0.86   -      -      -      -   
c <2e-16 <2e-16 -      -      -   
d <2e-16 <2e-16 0.73   -      -   
e <2e-16 <2e-16 <2e-16 <2e-16 -   
f <2e-16 <2e-16 <2e-16 <2e-16 0.63

P value adjustment method: none 
> pairwise.t.test(d$value, d$variable, p.adjust = "bon")

    Pairwise comparisons using t tests with pooled SD 

data:  d$value and d$variable 

  a      b      c      d      e
b 1      -      -      -      -
c <2e-16 <2e-16 -      -      -
d <2e-16 <2e-16 1      -      -
e <2e-16 <2e-16 <2e-16 <2e-16 -
f <2e-16 <2e-16 <2e-16 <2e-16 1

P value adjustment method: bonferroni

MYaseen208 · Accepted Answer · 2012-03-12T04:25:39.120

16

Try this one

X <- rnorm(n=50, mean = 10, sd = 5)
Y <- rnorm(n=50, mean = 15, sd = 6)
Z <- rnorm(n=50, mean = 20, sd = 5)
Data <- data.frame(X, Y, Z)

library(plyr)

combos <- combn(ncol(Data),2)

adply(combos, 2, function(x) {
  test <- t.test(Data[, x[1]], Data[, x[2]])

  out <- data.frame("var1" = colnames(Data)[x[1]]
                    , "var2" = colnames(Data[x[2]])
                    , "t.value" = sprintf("%.3f", test$statistic)
                    ,  "df"= test$parameter
                    ,  "p.value" = sprintf("%.3f", test$p.value)
                    )
  return(out)

})



  X1 var1  var2 t.value       df p.value
1  1   X      Y  -5.598 92.74744   0.000
2  2   X      Z  -9.361 90.12561   0.000
3  3   Y      Z  -3.601 97.62511   0.000

edited Mar 12 '12 at 04:25

answered Mar 12 '12 at 04:01

MYaseen208

22,666
37
165
309

1

MYassen208's answer is better. In general the plyr package should be used as much as possible. Dead handy!!! – Davy Kavanagh Mar 12 '12 at 04:02
1

Also just realised, if you wanted all pair-wise combinations, then MYaseen208's answer also shows you how to use combn() – Davy Kavanagh Mar 12 '12 at 04:05
1

Thanks, this works like charm. I do have a follow up question though: http://stackoverflow.com/q/9669411/612191 – ery Mar 12 '12 at 14:58

score 5 · Answer 3 · answered Mar 12 '12 at 05:38

5

Here is another solution, with outer.

outer( 
  1:ncol(Data), 1:ncol(Data), 
  Vectorize(
    function (i,j) t.test(Data[,i], Data[,j])$p.value
  ) 
)

answered Mar 12 '12 at 05:38

Vincent Zoonekynd

31,893
5
69
78

score 2 · Answer 4 · answered Mar 12 '12 at 04:01

Assuming your data frame looks something like this:

df = data.frame(a=runif(100),
                b=runif(100),
                c=runif(100),
                d=runif(100),
                e=runif(100),
                f=runif(100))

the the following

tests = lapply(seq(1,length(df),by=2),function(x){t.test(df[,x],df[,x+1])})

will give you tests for each set of columns. Note that this will only give you a t.test for a & b, c & d, and e & f. if you wanted a & b, b & c, c & d, d & e, and e & f, then you would have to do:

tests = lapply(seq(1,(length(df)-1)),function(x){t.test(df[,x],df[,x+1])})

finally if let's say you only want the P values from your tests then you can do this:

pvals = sapply(tests, function(x){x$p.value})

If you are not sure how to work with an object, try typing summary(tests), and str(tests[[1]]) - in this case test is a list of htest objects, and you want to know the structure of the htest object, not necessarily the list.

Hope this helped!

Erik Aronesty · Answer 5 · 2012-11-12T16:28:17.813

0

I run this:

tres<-apply(x,1,t.test)
pval<-vapply(tres, "[[", 0, i = "p.value")

It took me a while to divine the "vapply" trick to pull the pvals out of the t.test result object list. (I edited this from 'sapply' because of Henrik's comment below)

If it's a paired t-test, you can just subtract and test for means=0, which gives exactly the same result (that's all a paired t.test is):

tres<-apply(y-x,1,t.test)
pval<-vapply(tres, "[[", 0, i = "p.value")

Again this is a per-row t-test over all columns.

edited Nov 12 '12 at 16:28

answered Sep 21 '12 at 20:00

Erik Aronesty

11,620
5
64
44

1

dont use `sapply`, use `vapply`. you don't need the `unlist` and it will give an error if the data is not as expected. Furthermore, you cna use `"[["` as well. So I would do: `vapply(tres, "[[", 0, i = "p.value")` (the `0` just indicates that a numeric should be returned) – Henrik Sep 21 '12 at 20:36

R: t-test over all columns

5 Answers5

Linked