Setup For the purposes of my simulation, I'm generating a list of B=2000
elements, with each element being the output of a permutation procedure in which I first permute the rows of a 200x8000
matrix and for each column, I calculate the Kolmogorov-Smirnov test statistic between the first and second 100 rows (you can think of the first 100 rows as data from one group and the second 100 rows as data from another group).
Question This process takes a very long time (about 30-40 minutes) to generate the list. Is there a much faster way? In the future, I'd like to increase B
to a larger value.
Code
B=2000
n.row=200; n.col=8000
#Generate sample data
samp.dat = matrix(rnorm(n.row*n.col),nrow=n.row)
perm.KS.list = NULL
for (b in 1:B){
#permute the rows
perm.dat.tmp = samp.dat[sample(nrow(samp.dat)),]
#Compute the permutation-based test statistics
perm.KS.list[[b]]= apply(perm.dat.tmp,2,function(y) ks.test.stat(y[1:100],y[101:200]))
}
#Modified KS-test function (from base package)
ks.test.stat <- function(x,y){
x <- x[!is.na(x)]
n <- length(x)
y <- y[!is.na(y)]
n.x <- as.double(n)
n.y <- length(y)
w <- c(x, y)
z <- cumsum(ifelse(order(w) <= n.x, 1/n.x, -1/n.y))
z <- z[c(which(diff(sort(w)) != 0), n.x + n.y)] #exclude ties
STATISTIC <- max(abs(z))
return(STATISTIC)
}