1

I previously posted a question of how to create all possible combinations of a set of dataframes or the "power set" of possible data frames in this link: Creating Dataframes of all Possible Combinations without Repetition of Columns with cbind

I was able to create the list of possible dataframes by first creating all possible combinations of the names of the dataframes, and storing them in Ccols, a section of which looks like this:

enter image description here

using reduce and lapply, I then called each dataframe by its name, and stashed in lists, then stashed all those lists in a list of list to calculate the Means and Covariances:

ll_cov<- list()
ll_ER<- list()
for (ii in 2:length(Ccols)){
l_cov<- list()
l_ER<- list()
for (index in 1:ncol(Ccols[[ii]])){
ls<-list()
for (i in 1:length(Ccols[[ii]][,index]) ){

  KK<- get(Ccols[[ii]][i,index])
  ls[[i]] <-KK
}
DAT<- transform(Reduce(merge, lapply(ls, function(x) data.frame(x, rn = row.names(x)))), row.names=rn, rn=NULL)
l_cov[[index]]<- cov(DAT)
l_ER[[index]]<- colMeans(DAT)

}
ll_cov[[ii]]<- l_cov
ll_ER[[ii]]<- l_ER
}

However, the Loop is becoming too time-consuming due to the high number of dataframes being processed and cov and colMeans calculations. I searched and came across this example ( Looping over a list of data frames and calculate the correlation coefficient ) which mentions listing data frames and then applying cov as a function, but it still running way too slowly. I tried removing one of the loops by introducing one lapply instead of the very outer loop:

Power_f<- function(X){

l_D<- list()
for (index in 2:ncol(X)){

     ls<-list()
     for (i in 1:length(X[,index]) ){
          KK<- get(X[i,index])
          ls[[i]] <-KK
     }

DAT<- transform(Reduce(merge, lapply(ls, function(x) data.frame(x, rn = row.names(x)))), row.names=rn, rn=NULL)
l_D[[index]]<- (DAT)
}
return(l_D)
}

lapply(seq(from=2,to=(length(Ccols))), function(i) Power_f(Ccols[[i]]))

But it is still taking too long to run (I am not getting results). Is there a way to replace all the for looping with lapply and make it computationally efficient?

Community
  • 1
  • 1
El_1988
  • 339
  • 3
  • 13
  • It might be that the overwhelming majority of time is being spent on the calculations themselves rather than the looping. Have you profiled it? – John Coleman Oct 11 '17 at 14:09
  • thanks, I am not aware of profiling am looking it up now, but I removed all the cov and mean calculations and just told it to store all data frames in list, and it is still taking too much time to run. – El_1988 Oct 11 '17 at 14:11
  • @JohnColeman Hi I just tried to use Rprof , and as the code takes too long to run I am not able to see the profiling results until the results are actually generated it seems. – El_1988 Oct 11 '17 at 14:24
  • *I was able to create the list of possible dataframes...* your [question](https://stackoverflow.com/questions/46652276/creating-dataframes-of-all-possible-combinations-without-repetition-of-columns-w0) got two upvotes, meaning the community found it worthwhile and interesting. And yet no answer was posted likely because of no [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) like this interesting question. For future readers, please answer the question yourself since you came up with a solution. – Parfait Oct 11 '17 at 14:35

0 Answers0