I have a data frame (golubdf
) with 3051 genes and 38 columns (2 class labels--27 columns one label: 0, 11 columns other label: 1). I need to write a for
loop to iterate 500 times, where in each iteration, the columns of the data frame are shuffled (class labels mixed up), Wilcox test calculated on all genes and the maximum test statistic on all genes saved in a list:
t.test.all.genes <- function(x,s1,s2) {
x1 <- x[s1]
x2 <- x[s2]
x1 <- as.numeric(x1)
x2 <- as.numeric(x2)
t.out <- wilcox.test(x1,x2, alternative="two.sided", exact=F, correct=T)
out <- as.numeric(t.out$statistic)
return(out)
}
prs = replicate(500, apply(golubdf[ ,sample(ncol(golubdf))], 1,
t.test.all.genes, s1=labels==0, s2=labels==1))
ps.max = apply(prs, 1, max)
I am not sure if this is right--do I need to use rows or columns? Since I need the maximum test statistic on all genes I have used rows (1). After this, I need to get the 95% value test statistic from the list of maximum test statistics, which is were I am not sure how to get it to work.