0

I am generating a very large data frame consisting of a large number of combinations of values. As such, my coding has to be as efficient as possible or else 1) I get errors like - R cannot allocate vector of size XX or 2) the calculations take forever.

I am to the point where I need to calculate r (in the example below r = 3) deviations from the mean for each sample (1 sample per row of the df)(Labeled dev1 - dev3 in pic below):

enter image description here

These are my data in R:

enter image description here

I tried this (r is the number of values in each sample, here set to 3):

X2<-apply(X1[,1:r],1,function(x) x-X1$x.bar)

When I try this, I get:

enter image description here

I am guessing that this code is attempting to calculate the difference between each row of X1 (x) and the entire vector of X1$x.bar instead of 81 for the 1st row, 81.25 for the 2nd row, etc.

Once again, I can easily do this using for loops, but I'm assuming that is not the most efficient way.

Can someone please stir me in the right direction? Any assistance is appreciated.

Here is the whole code for the small sample version with r<-3. WARNING: This computes all possible combinations, so the df's get very large very quick.

options(scipen = 999)

dp <- function(x) {
    dp1<-nchar(sapply(strsplit(sub('0+$', '', as.character(format(x,  scientific = FALSE))), ".", 
        fixed=TRUE),function(x) x[2]))
    ifelse(is.na(dp1),0,dp1)
}

retain1<-function(x,minuni) length(unique(floor(x)))>=minuni

# =======================================================

r<-3

x0<-seq(80,120,.25)

X0<-data.frame(t(combn(x0,r)))

names(X0)<-paste("x",1:r,sep="")

X<-X0[apply(X0,1,retain1,minuni=r),]

rm(X0)
gc()

X$x.bar<-rowMeans(X)

dp1<-dp(X$x.bar)

X1<-X[dp1<=2,]

rm(X)
gc()

X2<-apply(X1[,1:r],1,function(x) x-X1$x.bar)
Dan
  • 165
  • 5
  • 18

2 Answers2

1

Because R is vectorized you only need to subtract x.bar from from x1, x2, x3 collectively:

devs <- X1[ , 1:3] - X1[ , 4]
X1devs <- cbind(X1, devs)

That's it...

SteveM
  • 2,226
  • 3
  • 12
  • 16
0

I think you just got the margin wrong, in apply you're using 1 as in row wise, but you want to do column wise so use 2:

X2<-apply(X1[,1:r], 2, function(x) x-X1$x.bar)

But from what i quickly searched, apply family isn't better in performance than loops, only in clarity. Check this post: Is R's apply family more than syntactic sugar?