How can I avoid writing nested for loops for large data sets?

Question

For a two-variable problem, outer is most likely the best solution for this and if the space to loop over is small enough then we can have expand.grid do our legwork. However, those are ruled out if we have more than two variables and a large space to loop over. outer can't handle more than two variables and expand.grid eats up more memory than I've ever seen a machine be able to take.

I've recently found myself writing code like this:

n<-1000
for(c in 1:n){
    for(b in 1:c){
        for(a in 1:b){
            if(foo(a,b,c))
            {
                bar(a,b,c)
            }
        }
    }
}

In these cases it does seem like a nested loop is the natural solution (e.g. mapply won't do and there's no nice factors for tapply to use), but is there a better way? It seems like a path to bad code.

I suspect that combn might be able to do it somehow, but in my experience it doesn't take long for that to either fall in to the same memory trap as expand.grid. If memory serves, I've also known it to take the ill-advised step of telling me to make changes to my global settings for recursion limits.

Is there a name for these sorts of loops? That is, loops that are bounded by the previous loop's current index? I feel like I've missed out on an appropriate tag or magic search engine word. Not to mention, the correct terminology would make for a clearer question title that is less likely to be marked as a duplicate. — J. Mini, Jun 05 '20 at 22:50
this is a great question. Maybe a "triangular" loop (I just made that up)? I wish I knew of a better solution than writing this in Rcpp, but I'm not sure there is one ... — Ben Bolker, Jun 05 '20 at 22:53
Not sure about the computational efficiency but you could use `expand.grid()` on a = b = c = 1000 and then filter for a <= b <= c. You could then use `apply` functions on the "input" data frame. Edit: just saw you ruled out `expand.grid()`. Very interesting question! — SBFin, Jun 05 '20 at 23:00
@SBFin The problem with ```expand.grid(1:1000,1:1000,1:1000)``` isn't so much efficiency as it is the memory usage. You don't want to be working on something that large. — J. Mini, Jun 05 '20 at 23:04
This is a dupe question, I think, see if my answer (with `lazyExpandGrid`) helps at all: https://stackoverflow.com/a/36144255/3358272 (and the formalized [gist](https://gist.github.com/r2evans/e5531cbab8cf421d14ed)). — r2evans, Jun 05 '20 at 23:55
If your function is not fully vectorized, `expand.grid` is the least of your problems (and that could be solved by chunking). If you need a function call for each combination, the main issue is efficiency. Calls to closures are expensive. I'd turn to Rcpp. — Roland, Jun 08 '20 at 10:32
@Roland ```expand.grid``` could be solved by chunking, even if that is a bit ugly. I'll give Rcpp a look, but what sort of input would you be expecting if I were to fully vectorize this function? Other than for loops, I don't know any way to feed an R function a series of inputs that's structured like the desired one. — J. Mini, Jun 08 '20 at 12:32
If you use Rcpp, you'd implement `foo` and `bar` in C++ and then use C++ `for` loops equivalent to the R `for` loops you show here. If `foo` and `bar` are simple, that's very easy. I don't want to dive into more details without knowing more about these functions. — Roland, Jun 08 '20 at 13:04
@Roland I wouldn't worry. In practice, you're solution is probably right. But I doubt that "Write the loops in C++ instead" will be taken as a good answer to "how can I avoid writing these loops in R?". — J. Mini, Jun 08 '20 at 14:02
@J.Mini Well, if you can't (and for some functions you can't), that's the best answer you can get. Note that vectorization is implemented with C loops. So, saying you need to implement it in a compiled language seems a valid answer. — Roland, Jun 09 '20 at 05:42

Cole · Accepted Answer · 2020-06-09T00:33:12.520

6

This is combinations with repetitions. rcppalgos is likely your best out of the box but at n = 1000L, that's just over 500 million combinations to go through which will take up ~ 2GB of ram.

library(RcppAlgos)
n = 1000L
mat <- comboGeneral(n, 3L, repetition = TRUE)

Now there are two routes to go. If you have the RAM and your function is able to be vectorized, you can do the above very quickly. Let's say if the sum of the combination is greater than 1000 you want the means of the combination, other wise you want the sum of the combination.

res <- if (rowSums(mat) > 1000L) 
  rowMeans(mat)
else
  rowSums(mat)

## Error: cannot allocate vector of size 1.2 Gb

Oh no! I get the dreaded allocate vector error. rcppalgos allows you to return the result of a function. But note that it returns a list and is a lot less fast because it is going to have to evaluate your R function instead of staying in c++. Because of this, I changed to n = 100L because I do not have all day...

comboGeneral(100L, 3L, repetition = TRUE,
                        FUN = function(x) { 
                          if (sum(x) > 100L)
                            mean(x)
                          else
                            sum(x)
                        }
)

If I had a static set where I was always choosing 3 combinations out of n, I would likely use Rcpp code directly depending on what foo(a,b,c) and bar(a,b,c) are but first I would like to know more about the functions.

edited Jun 09 '20 at 00:33

answered Jun 08 '20 at 23:07

Cole

11,130
1
9
24

Nice answer as always. I also appreciate the nod. Iterators have been added as of `2.4.0` and I think `comboIter` could help in this situation. See [Combinatorial Iterator](https://jwood000.github.io/RcppAlgos/articles/CombinatoricsIterators.html) for more. – Joseph Wood Jun 09 '20 at 13:08
@JosephWood that is very interesting - thanks for the tip of that feature! For 500 million combinations, it might be nice to call it directly from [tag:Rcpp]. Can one do ```for (int i = 0; i < comboCount(1000, 3, repetitions = TRUE); i++) { foo(comboGeneral(1000, 3, lower = i + 1, upper = i+1))}``` directly in Rcpp? – Cole Jun 10 '20 at 00:24
1

You are spot on. There is a plan to eventually make all of the features available in C++, like a header only library (e.g. `dqrng`). Unfortunately, that is a ways down the road. – Joseph Wood Jun 10 '20 at 00:41
1

This is very impressive and I will continue to play around with it, although ```comboGeneral``` is a pain when you're wanting to stick a condition in the FUN argument like ```FUN =function(x) if(do.call(foo,as.list(x))){x}``` and drop any ```x``` that don't satisfy it (like my original ```for``` loop). I'll look in to ```comboIter```. – J. Mini Jun 10 '20 at 19:05
1

@JosephWood ```comboIter``` looks like it might be helpful. As much as I don't like an answer that essentially says "to do this in R, we must get R to do it in C++", if you feel like posting a ```comboIter``` solution, I'm sure that it will be well-received. – J. Mini Jun 10 '20 at 19:45
1

```comboIter``` would look a lot like r2evans solution although it is nice that ```comboIter``` also allows you to use a function as well. But, while Joseph Woods (great!) [tag:RcppAlgos] package uses ```Rcpp```, a solution using ```comboIter``` would mostly be in R. – Cole Jun 11 '20 at 02:00
Interesting, I implemented a ```comboIter``` solution and I found that it took about 1.2 times as long to run as the base R nested loop solution and the ```comboGeneral``` solution. I suspect that the issue with using ```comboIter``` is that, to my knowledge, you have to wrap it all in a ```while(!isFALSE(iter))``` block, meaning that you're having to do a lot of unnecessary checks for if you've ran out of iterations. – J. Mini Jun 11 '20 at 15:37
1

@J.Mini, as Cole points out, essentially when you use `comboIter$nextIter()`, you are operating at the speed of `R`. The only benefit with the iterators is memory efficiency and convenience. I would like to point out, that you can still generate many combinations at the speed of `C++` with the `nextNIter(n)` method, but you are still left looping over combinations in R to check your constraints. I honestly think the best approach is what Cole has provided with a custom `Rcpp` solution if your actual situation causes you to exceed available RAM. – Joseph Wood Jun 11 '20 at 17:13

r2evans · Answer 2 · 2020-06-06T00:37:03.583

My previous function lazyExpandGrid is not a perfect match, but I think it addresses your concern about memory-exhaustion. Other languages have the prospect of a lazy iterator; R has it in the iterators package, and since I'm not proficient with it, some time ago I wrote this gist to address an itch.

One problem with lazyExpandGrid is that it expects the factors to be pre-defined. This can be handled with a quick condition, so it'll be memory-efficient though admittedly not space-efficient. I don't think it'd be a quick fix to implement conditionals in the method, since its mechanism for lazily dealing with the expansion is knowing mathematically which index attaches to which combination of factors ... and conditions will bust that.

Here's how that function can work here:

n <- 3
it <- lazyExpandGrid(aa = 1:n, bb = 1:n, cc = 1:n)
while (length(thistask <- it$nextItem())) {
  if (with(thistask, bb > aa || cc > bb)) next
  print(jsonlite::toJSON(thistask))
}
# [{"aa":1,"bb":1,"cc":1}] 
# [{"aa":2,"bb":1,"cc":1}] 
# [{"aa":3,"bb":1,"cc":1}] 
# [{"aa":2,"bb":2,"cc":1}] 
# [{"aa":3,"bb":2,"cc":1}] 
# [{"aa":3,"bb":3,"cc":1}] 
# [{"aa":2,"bb":2,"cc":2}] 
# [{"aa":3,"bb":2,"cc":2}] 
# [{"aa":3,"bb":3,"cc":2}] 
# [{"aa":3,"bb":3,"cc":3}] 

### to demonstrate what an exhausted lazy-expansion looks like
it$nextItem()
# NULL
it$nextItem()
# NULL

(Note how the conditional with next skips those combinations.)

That would translate to your flow as:

n <- 1000
it <- lazyExpandGrid(aa = 1:n, bb = 1:n, cc = 1:n)
it
# lazyExpandGrid: 4 factors, 1e+09 rows
#   $ index : 0

while (length(thistask <- it$nextItem())) {
  if (with(thistask, bb > aa || cc > bb)) next
  with(thistask, {
    if (foo(aa, bb, cc)) bar(aa, bb, cc)
  })
}

(Or without the with, using thistask$aa, etc.)

Note: I'm not going to lie, though, this simplifies the flow, it does not make it fast. In this case, doing something 1e+09 times is going to take time, and I don't know of anything that will help with that besides parallel operations and perhaps a friendly cluster of R hosts. (I started running an empty no-op while loop as above and it took 268 seconds to get through 822K of them. I hope you have a lot of processing power.)

This is a nice way to remove some unnecessary computations, but I was hoping that R would have a more natural way of solving this. As it stands, I'd say that nested loops make for much more readable code that more clearly does what it says it does, but this solution has a good chance of being computationally better. — J. Mini, Jun 06 '20 at 12:33
R doesn't have a natural way (currently) of lazily defining an expand-grid. This does not make it *computationally* better at all, it just makes "counting" big things less memory inefficient. If you'd prefer multiple nested `for` loops to a single loop, then why is your question titled *"how can I avoid writing nested for loops"*? — r2evans, Jun 06 '20 at 14:43
I'm serious about that question, @J.Mini: your question title is specifically *"How can I avoid writing nested for loops for large data sets"*, and when presented with a way to reduce your code into one (*not*-nested) loop *without* exhausting memory, you say you prefer nested loops. I believe that when presented with declarative code such as nested `for` loops, *any* effort to "reduce" the nesting will result in subjectively less-readable code. It seems as if your question is not about nesting of the loop as much as doing this one thing as quickly as possible without exhausting memory. — r2evans, Jun 12 '20 at 15:38
My original hope was for a really nice ```apply``` family solution. I expected that problems like this were common and therefore had a solution in idiomatic R code. That's what made me dislike this solution. It works and deserves its up votes, but it's obviously an unusual approach. — J. Mini, Jun 12 '20 at 19:51
A `Map`/`mapply`/`purrr::pmap` workflow could be used, but none of them employ lazy-evaluation, so you'll run into the same problems. Especially since you are nesting with dependency (second loop depends on value of first loop's value), I think you're past "clever single-function" to do what you need. (All of this changes if you can subset the data into bite-size chunks that don't require lazy-evaluation.) — r2evans, Jun 12 '20 at 21:37

score 4 · Answer 3 · answered Jun 11 '20 at 01:32

It is important to point out why using rcpp is both easy and recommended for this.

When we refer to r, under the hood is a bunch of code compiled in c. Thus far, the R developers have not needed to develop compiled code to allow arbitrary functions foo() and bar() to be used in combinations with repetitions. So as users, we can use an r loop to have the flexibility of r or, when we have a lot of iterations to loop through, look at some alternatives.

The R loop is trivial to make into an Rcpp loop. I have included arbitrary functions just so we can return something (it would have been nice if the OP post had included something to return as well...):

#include <Rcpp.h>
using namespace Rcpp;

bool foo(int x, int y, int z) {
  return((x + y + z) > 50);
}

int bar(int x, int y, int z) {
  return(x - y + z);
}

// [[Rcpp::export]]
double manual_combos_w_reps(const int n) {
  double ans = 0;

  for (int i = 1; i <= n; i++) {
    for (int j = 1; j <= i; j++) {
      for (int k = 1; k <= j; k++) {
        if (foo(i, j, k)) {
          ans += bar(i, j, k);
        }
      }
    }
  }

  return(ans);
}

Here's the counter-part in R which is just your code with foo(...) and bar(...) added.

r_foo = function(x, y, z) {
  return((x + y + z) > 50L)
}

r_bar = function (x, y, z) {
  return(x - y + z)
}

r_loop = function (n) {
  ans = 0;
  for (i in 1:n) {
    for (j in 1:i) {
      for (k in 1:j) {
        if (r_foo(i, j, k)) {
          ans = ans + r_bar(i, j, k)
        }
      }
    }
  }
  return(ans)
}

Now this is the magic part. Rcpp flies through these iterations. For n = 1000L, it takes R code 360 seconds to run. It takes Rcpp only 0.5 seconds to run.

n = 10L
bench::mark(manual_combos_w_reps(n)
            , r_loop(n)
            )

### A tibble: 2 x 13
##  expression                 min median `itr/sec` mem_alloc
##  <bch:expr>              <bch:> <bch:>     <dbl> <bch:byt>
##1 manual_combos_w_reps(n)  4.7us    5us   178048.    2.48KB
##2 r_loop(n)               1.63ms 1.68ms      505.        0B

n = 100L

### A tibble: 2 x 13
##  expression                min median `itr/sec` mem_alloc
##  <bch:expr>              <bch> <bch:>     <dbl> <bch:byt>
##1 manual_combos_w_reps(n) 467us  469us   2064.      2.48KB
##2 r_loop(n)               627ms  627ms      1.60        0B

n = 1000L

### A tibble: 2 x 13
##  expression                   min   median `itr/sec`
##  <bch:expr>              <bch:tm> <bch:tm>     <dbl>
##1 manual_combos_w_reps(n) 459.29ms 459.39ms   2.18   
##2 r_loop(n)                  6.04m    6.04m   0.00276

You should absolutely look into rcpp for this - there is no truly canonical answer in base R that will provide high performance for the task you are doing. The only concern is what your actual foo() and bar() functions are as they may be difficult to implement in rcpp.

Nice demonstrating! The "tell" will be when we see if `foo` and `bar` are easily `Rcpp`-izable ... since without that, the `Rcpp`-R-`Rcpp` jumping (if needed) will negate some of the speed improvements. — r2evans, Jun 11 '20 at 03:25

Waldi · Answer 4 · 2020-06-13T20:47:52.643

purrr solution with .filter also works :

library(purrr)

n <- 10L
levels <- 3L

# keep only elements below diagonal
isdesc<- function(...){all(diff(unlist(list(...)))<=0)}
# some extra filtering
foo <- function(...) { sum(unlist(list(...)))==27}

filter <- function(...) {!isdesc(...)|!foo(...)}

cross_list <- cross(rep(list(1L:n),levels),.filter = filter)

bar <- function(...) ( unlist(list(...))) 

cross_list %>% map(bar)

Unfortunately, like grid.expand, it doesn't scale nicely because cross first allocates the complete cartesian product before filtering it.

How can I avoid writing nested for loops for large data sets?

4 Answers4