Calculate Means and Covariances for large list of dataframes, replacing loops with lapply

Question

I previously posted a question of how to create all possible combinations of a set of dataframes or the "power set" of possible data frames in this link: Creating Dataframes of all Possible Combinations without Repetition of Columns with cbind

I was able to create the list of possible dataframes by first creating all possible combinations of the names of the dataframes, and storing them in Ccols, a section of which looks like this:

using reduce and lapply, I then called each dataframe by its name, and stashed in lists, then stashed all those lists in a list of list to calculate the Means and Covariances:

ll_cov<- list()
ll_ER<- list()
for (ii in 2:length(Ccols)){
l_cov<- list()
l_ER<- list()
for (index in 1:ncol(Ccols[[ii]])){
ls<-list()
for (i in 1:length(Ccols[[ii]][,index]) ){

  KK<- get(Ccols[[ii]][i,index])
  ls[[i]] <-KK
}
DAT<- transform(Reduce(merge, lapply(ls, function(x) data.frame(x, rn = row.names(x)))), row.names=rn, rn=NULL)
l_cov[[index]]<- cov(DAT)
l_ER[[index]]<- colMeans(DAT)

}
ll_cov[[ii]]<- l_cov
ll_ER[[ii]]<- l_ER
}

However, the Loop is becoming too time-consuming due to the high number of dataframes being processed and cov and colMeans calculations. I searched and came across this example ( Looping over a list of data frames and calculate the correlation coefficient ) which mentions listing data frames and then applying cov as a function, but it still running way too slowly. I tried removing one of the loops by introducing one lapply instead of the very outer loop:

Power_f<- function(X){

l_D<- list()
for (index in 2:ncol(X)){

     ls<-list()
     for (i in 1:length(X[,index]) ){
          KK<- get(X[i,index])
          ls[[i]] <-KK
     }

DAT<- transform(Reduce(merge, lapply(ls, function(x) data.frame(x, rn = row.names(x)))), row.names=rn, rn=NULL)
l_D[[index]]<- (DAT)
}
return(l_D)
}

lapply(seq(from=2,to=(length(Ccols))), function(i) Power_f(Ccols[[i]]))

But it is still taking too long to run (I am not getting results). Is there a way to replace all the for looping with lapply and make it computationally efficient?

It might be that the overwhelming majority of time is being spent on the calculations themselves rather than the looping. Have you profiled it? — John Coleman, Oct 11 '17 at 14:09
thanks, I am not aware of profiling am looking it up now, but I removed all the cov and mean calculations and just told it to store all data frames in list, and it is still taking too much time to run. — El_1988, Oct 11 '17 at 14:11
@JohnColeman Hi I just tried to use Rprof , and as the code takes too long to run I am not able to see the profiling results until the results are actually generated it seems. — El_1988, Oct 11 '17 at 14:24
*I was able to create the list of possible dataframes...* your [question](https://stackoverflow.com/questions/46652276/creating-dataframes-of-all-possible-combinations-without-repetition-of-columns-w0) got two upvotes, meaning the community found it worthwhile and interesting. And yet no answer was posted likely because of no [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) like this interesting question. For future readers, please answer the question yourself since you came up with a solution. — Parfait, Oct 11 '17 at 14:35

Calculate Means and Covariances for large list of dataframes, replacing loops with lapply

0 Answers0

Linked