1

I've been wrecking my head for the past four hours trying to find the solution to an R problem, which is driving me nuts. I've searching everywhere for a decent answer but so far I've been hitting wall after wall. I am now appealing to your good will of this fine community for help.

Consider the following dataset:

set.seed(2112)
DataSample <- matrix(rnorm(24000),nrow=1000)
colnames(DataSample) <- c(paste("Trial",1:12,sep=""),paste("Control",13:24,sep=""))

I need to perform a t-test for every row in DataSample in order to find out if groups TRIAL and CONTROL differ (equal variance applies).

Then I need to count the number of rows with a p-value equal to, or lower than 0.05.

So here is the code I tried, which I know is wrong:

set.seed(2112)
DataSample <- matrix(rnorm(24000),nrow=1000)
colnames(DataSample) <- c(paste("Trial",1:12,sep=""),paste("Control",13:24,sep=""))

pValResults <- apply(
  DataSample[,1:12],1,function(x) t.test(x,DataSample[,13:24], var.equal=T)$p.value
  )

sum(pValResults < 0.05) # Returns the wrong answer (so I was told)

I did try looking at many similar questions around stackoverflow, but I would often end-up with syntax errors or a dimensional mismatch. The code above is the best I could get without returning me an R error -- but I since the code is returning the wrong answer I have nothing to feel proud of.

Any advice will be greatly appreciated! Thanks in advance for your time.

Karolis Koncevičius
  • 9,417
  • 9
  • 56
  • 89
pmdci
  • 270
  • 5
  • 23
  • 1
    You just need to fix up your apply statement. pvalresults <- apply(DataSample,1,function(x){t.test(x[1:12],x[13:24],var.equal = T)$p.value}) – ARobertson Jul 23 '15 at 21:04
  • Ha! your code returned 56 (that is the correct answer when using see 879). While researching for an answer I stumbled onto this code instead, but unfortunately it returns 55. I wonder what is the difference between the two? I do get confused with the use of FUNCTION inside apply (and apply is also news to me. Here is the code I was told to use: `pValResults <- apply(ME,1, function(x) t.test(x[1:12],x[13:24])$p.value) #this is what I used` – pmdci Jul 23 '15 at 21:34
  • 1
    The only difference is that t.test uses var.equal as False by default. My code (and your sample code above) uses var.equal = T meaning the t.test calculates the p values differently. Check the t.test documentation, and it explains it. – ARobertson Jul 23 '15 at 21:41
  • Did you mean **paired**? Neither of the answers you've had so far give you that. It's also worth noting that the `var.equal` parameter doesn't do anything for paired t tests. – Nick Kennedy Jul 24 '15 at 08:29
  • Does this answer your question? [doing t.test for columns for each row in data set](https://stackoverflow.com/questions/28119894/doing-t-test-for-columns-for-each-row-in-data-set) – Karolis Koncevičius May 13 '20 at 21:28

4 Answers4

1

One option is to loop over the data set calculating the t test for each row, but it is not as elegant.

set.seed(2112)
DataSample <- matrix(rnorm(24000),nrow=1000)
colnames(DataSample) <- c(paste("Trial",1:12,sep=""),paste("Control",13:24,sep=""))

# initialize vector of stored p-values
pvalue <- rep(0,nrow(DataSample))

for (i in 1:nrow(DataSample)){
   pvalue[i] <- t.test(DataSample[i,1:12],DataSample[i,13:24])$p.value
}
# finding number that are significant
sum(pvalue < 0.05)
Calvin
  • 1,309
  • 1
  • 14
  • 25
  • Interesting. Your code returned the same value as another code I was suggested to in an IRC chat. Consider using the set.seed(879) instead of 2112 (which the real one that I must use), your code (as well as the code I got on IRC) returns 55. But I was told that the correct answer is 56. – pmdci Jul 23 '15 at 21:38
  • OK something definitely went wrong for me. I closed R then reopened it and now all of a sudden your code returns the correct values. One thing that it was missing from your code is var.equal=T, plus the sum is meant to be for pvalue <= 0.05. However even if I used your code as-is, it is now returning 56 (the correct value when using set 879) as I try it for the second time. – pmdci Jul 23 '15 at 23:00
1

I converted to a data.table, and the answer I got was 45:

DataSample.dt <- as.data.table(DataSample)
sum(sapply(seq_len(nrow(DataSample.dt)), function(x)
    t.test(DataSample.dt[x, paste0('Trial', 1:12), with=F],
           DataSample.dt[x, paste0('Control', 13:24), with=F],
           var.equal=T)$p.value) < 0.05)
Chris Watson
  • 1,347
  • 1
  • 9
  • 24
  • Yes, 45 is the value when using the seed 2112. When using seed 879 I was meant to get 56 and I was getting 55, except with ARobertson's code above which in fact returned 56. Now for whatever reason I restarted R later on and retried all codes and all of them were giving 56. But I can swear I did not perform any changes in the data set for this discrepancy to occur... so I think I am going senile. – pmdci Jul 24 '15 at 14:28
1

To do a paired T test, you need to supply the paired = TRUE parameter. The t.test function isn't vectorised, but it's quite simple to do t tests a whole matrix at a time. Here's three methods (including using apply):

library("genefilter")
library("matrixStats")
library("microbenchmark")
dd <- DataSample[, 1:12] - DataSample[, 13:24]
microbenchmark::microbenchmark(
  manual = {ps1 <- 2 * pt(-abs(rowMeans(dd) / sqrt(rowVars(dd) / ncol(dd))), ncol(dd) - 1)},
  apply = {ps2 <- apply(DataSample, 1, function(x) t.test(x[1:12], x[13:24], paired=TRUE)$p.value)},
  rowttests = {ps3 <- rowttests(dd)[, "p.value"]})
#Unit: milliseconds
#      expr        min         lq       mean     median         uq        max
#    manual   1.611808   1.641783   1.677010   1.663122   1.709401   1.852347
#     apply 390.869635 398.720930 404.391487 401.508382 405.715668 634.932675
# rowttests   2.368823   2.417837   2.639671   2.574320   2.757870   7.207135
# neval
#   100
#   100
#   100

You can see the manual method is over 200x faster than apply.

If you actually meant an unpaired test, here's the equivalent comparison:

microbenchmark::microbenchmark(
  manual = {x <- DataSample[, 1:12]; y <- DataSample[, 13:24]; ps1 <- 2 * pt(-abs((rowMeans(x) - rowMeans(y)) / sqrt((rowVars(x) + rowVars(y)) / ncol(x))), ncol(DataSample) - 2)},
  apply = { ps2 <- apply(DataSample, 1, function(x) t.test(x[1:12], x[13:24], var.equal = TRUE)$p.value)},
  rowttests = {ps3 <- rowttests(DataSample, factor(rep(1:2, each = 12)))[, "p.value"]})

Note the manual method assumes that the two groups are the same sizes.

Nick Kennedy
  • 12,510
  • 2
  • 30
  • 52
  • Oh darn! I am really sorry I made a serious mistake in the title. You are right but it is not meant to be a paired test. I was meant to say that var.equal=T and I got my wires crossed. – pmdci Jul 24 '15 at 14:29
0

Adding an alternative using an external library.

Performing the test:

library(matrixTests)
res <- row_t_equalvar(DataSample[,1:12], DataSample[,13:24])

Format of the result:

res

   obs.x obs.y obs.tot      mean.x       mean.y    mean.diff     var.x     var.y var.pooled    stderr df    statistic     pvalue   conf.low   conf.high alternative mean.null conf.level
1     12    12      24  0.30569721  0.160622830  0.145074376 0.5034806 1.0769678  0.7902242 0.3629105 22  0.399752487 0.69319351 -0.6075559  0.89770469   two.sided         0       0.95
2     12    12      24 -0.27463354 -0.206396781 -0.068236762 0.8133311 0.2807800  0.5470556 0.3019535 22 -0.225984324 0.82329990 -0.6944500  0.55797651   two.sided         0       0.95
3     12    12      24 -0.19805092 -0.023207888 -0.174843032 0.4278359 0.5604078  0.4941219 0.2869733 22 -0.609265949 0.54858909 -0.7699891  0.42030307   two.sided         0       0.95

Number of rows with p <= 0.05:

> sum(res$pvalue <= 0.05)
[1] 4
Karolis Koncevičius
  • 9,417
  • 9
  • 56
  • 89