The trick is to use apply()
to calculate all the row statistics all at once and then to do the operations column-wise like like so:
# calculate the row means and sds's using apply()
theta.means <- apply(samples[,37:2574], # the object to summarized
1, # summarize over the rows (MARGIN = 1)
mean) # the summary function
theta.sds <- apply(samples[,37:2574],1,sd)
# define a function to apply for each row
standardize <- function(x)
(x - mean(x))/sd(x)
# apply it it over for each row (MARGIN = 1)
samples[,37:2574] <- t(apply(samples[,37:2574],1,standardize))
# subtract theta-mean * column 1-18 from columns 19-36
for (j in 1:18){
samples[, j] <- samples[,j] * theta.sds
theta.mean.beta <- theta.means * samples[, j]
samples[, j + 18] <- samples[, j + 18] - theta.mean.beta
}
Be sure and double check that this code is equivalent to your original code by taking a subset of rows (e.g. 'samples <- samples[1:100,]`) and checking that the results are the same (I would have done this my self, but there wasn't an example dataset posted...).
UPDATE:
Here's a more efficient implementation based on David Arenburg's comments below:
# calculate the row means via rowMeans()
theta.means <- rowMeans(as.matrix(samples[,37:2574]))
# redefine SD to be vectorized with respect to rows in the data.frame
rowSD <- function(x)
sqrt(rowSums((x - rowMeans(x))^2)/(dim(x)[2] - 1))
# calculate the row means and sds's using the vectorized version of SD
theta.sds <- rowSD(as.matrix(samples[,37:2574]))
Now use the fact when you subtract a vector (x
) from a data.frame (df
),
R recycles the values of x
-- and when lengh(x) == nrow(df)
the result
is the same as subtracting x
from each column of df
:
# standardize columns 37 through 2574
samples[,37:2574] <- (samples[,37:2574] - theta.means)/theta.sds
Now do similar calculations for rows 1:18
and 19:36
# subtract theta-mean * column 1-18 from columns 19-36
samples[, 1:18] <- samples[,1:18] * theta.sds
samples[, 1:18 + 18] <- samples[, 1:18 + 18] - theta.means * samples[,1:18] * theta.sds