1

I have a data set with around 1000 columns/parameters and want to perform regression among each of these parameters. So, data in column 1 will be stacked against all other 999 parameters for linear regression and so on.

The nonoptimized version of this approach would be:

loop <- c(1:ncol(Data))
for ( column in loop ){

    # Fetch next data to be compared
    nextColumn <- column + 1

    # Fetch next column
    while ( nextColumn <= ncol(Data) ){   

       # Analysis logic

       # Increment the counter
       nextColumn <- nextColumn + 1

   }
}

Above code will work, but will take lot of time. To optimize, I want to use parallel processing in R. There are many different packages which can be useful in this case, for example parallel and doparallel as explained in this question.

However, there might be some overhead involved which as a new R programmer I might not be aware off. I am looking for suggestions from R experts on better way to write above code in R and whether any specific package can be useful.

Looking forward to suggestions, thanks.

Chetan Arvind Patil
  • 854
  • 1
  • 11
  • 31
  • 1
    If you are new to R, I wouldn't try to parallelize your code. Call your regression function inside apply(). Have a look at the following [link](https://stackoverflow.com/questions/20342661/apply-in-r-with-user-defined-function). – user 123342 Jun 27 '17 at 18:33
  • @JamieMac: Thanks. I am having hard time figuring out how `apply()` will fetch two columns/parameters at a time, perform regression and moves to next combination. Currently, in the analysis logic I am also capturing all the `summary()` data, so I have a vector that keeps updating while regression loops through different columns/parameters. Still reading through `apply()` documents, but if you have any suggestions, please do share. – Chetan Arvind Patil Jun 28 '17 at 20:09

1 Answers1

2

Use mapply like this:

X <- 1:(ncol(mtcars)-1)     # first through penultimate column
Y <- 2:ncol(mtcars)         # second through last column
mapply(function(x,y) sum(mtcars[,x],mtcars[,y]), X, Y)
CPak
  • 13,260
  • 3
  • 30
  • 48