-2

Here is my data, 900000 obs of 9 variables. I've tried apply function but unable to give parameters in apply function. Data looks like this.

ID A1 A2 A3 A4 A5 B1 B2 B3 B4
1  10 12 11 13 15 50 55 56 57
2  20 22 23 21 20 60 76 78 71
3  10 12 13 15 14 50 55 52 53
...
90000 11 12 13 15 12 21 22 23 24

I need to perform 900000 times two sample student t test from those 9 variables divide into 2 groups (group A and B). Can anyone post a code here?

Edit: Thanks for the comment, I make following change. sample data

testx <- structure(list(RAS = c(0.554246173201929, 0.292104162206435, 
0.201932255556074), RASSYX2 = c(0.673628450549317, 0.370730964566956, 
0.240868661848041), RASSYX3 = c(0.592972062397773, 0.387737676651884, 
0.258971711587807)), .Names = c("RAS", "RASSYX2", "RASSYX3"), row.names =c(NA, 
3L), class = "data.frame")

testy <- structure(list(test2 = c(0.682230776398731, 0.299007374701463, 
0.21735652533812), test3 = c(0.660308325914822, 0.340956947569367, 
0.255153956615115), test4 = c(0.625506839884405, 0.281695127521423, 
0.265769288207206)), .Names = c("test2", "test3", "test4"), row.names = c(NA, 
3L), class = "data.frame")

the row1 of testx should compare with row1 of testy, and there will be 900000 rows, I just need to make this test automated for 900000 times. So I hope to do two sided equal variance t test with confidence level of 95%.

I tried us this but apparently the y is not what i want to test.

apply(testx,1,t.test,testy)
rawr
  • 20,481
  • 4
  • 44
  • 78
rudy
  • 13
  • 3
  • 1
    Show the code you've attempted and describe how it doesn't work. We're here to help you with code, not write it for you. You should always make your [examples reproducible](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) – MrFlick Apr 25 '16 at 18:57
  • At least provide the code for 1 line of data. This is Left, right or two sided? equal or unequal variance? What is your confidence level? etc... – Dave2e Apr 25 '16 at 19:05
  • Or: sapply(1:n, FUN = function(i){t.test(testx[i,], testy[i,], alternative = "two.sided", var.equal = TRUE)$p.value}), when n is the number of rows. – Dave2e Apr 25 '16 at 21:05
  • Thanks Dave2e, it works very well! – rudy Apr 28 '16 at 17:44

1 Answers1

0

Thanks for clarifying your question. I wrote the following solution before your clarification using simulated data.

Here is the simulated data set. If your data is in wide form, you should really consider getting it into long form...unless you're doing a paired test which you did not mention.

set.seed(1)

d<-data.frame(PatID=1:100, 
              group=rep(c('A','B'),50),
              Var1=rnorm(100, 500, 20),
              Var2=rnorm(100, 500, 20),
              Var3=rnorm(100, 500, 20),
              Var4=rnorm(100, 500, 20))

And now we loop through a list of column names we want to test and perform the test.

vars_to_test<-c('Var1','Var2','Var3','Var4')

t_res<-lapply(vars_to_test, function(var){ t.test( d[,var] ~ d[,'group'])})

names(t_res)<-vars_to_test

t_res is now a list of lists...one element per t-test. Because I named the elements of t_res, I can access the test results of any of my variables easily:

In this case, I access the p-value of the t test testing difference of mean Var1 between group A and B:

> t_res[['Var1']]$p.value
[1] 0.3373045
AOGSTA
  • 698
  • 4
  • 11
  • Thanks alot Arman, yes you're right i should convert wide into long form and it works. – rudy Apr 25 '16 at 19:52