2

Is there a way to allocate an Rcpp List of length n, where each element of the List will be filled with a NumericMatrix, but the size of each NumericMatrix can change?

I have an idea for doing this using std::list and push_back(), but the size of the list may be quite large and I want to avoid the overhead of creating an extra copy of the list when I return from the function.

The below R code gives an idea of what I hope to do:

myvec = function(n) {
  x = vector("list", n)
  for (i in seq_len(n)) {
    nc = sample(1:3, 1)
    nr = sample(1:3, 1)
    x[[i]] = matrix(rbinom(nc * nr, size = 1, prob = 0.5),
                    nrow = nr, ncol = nc)
  }
  x
}

This could result in something like:

> myvec(2)
[[1]]
     [,1]
[1,]    0
[2,]    1

[[2]]
     [,1] [,2] [,3]
[1,]    0    1    0
[2,]    0    1    1

Update: based on the comments of @Dirk and @Ralf, I created functions based on Rcpp::List and std::list with a wrap at the end. Speed comparisons don't seem to favor one version over the other, but perhaps there's an inefficiency I'm not aware of.

src = '
#include <Rcpp.h>
// [[Rcpp::export]]
Rcpp::List myvec(int n) {
  Rcpp::RNGScope rngScope;
  Rcpp::List x(n);
  // Rcpp::IntegerVector choices = {1, 2 ,3};
  Rcpp::IntegerVector choices = Rcpp::seq_len(50);
  for (int i = 0; i < n; ++i) {
    int nc = Rcpp::sample(choices, 1).at(0);
    int nr = Rcpp::sample(choices, 1).at(0);
    Rcpp::NumericVector entries = Rcpp::rbinom(nc * nr, 1, 0.5);
    x(i) = Rcpp::NumericMatrix(nc, nr, entries.begin());
  }
  return x;
}

// [[Rcpp::export]]
Rcpp::List myvec2(int n) {
  Rcpp::RNGScope scope;
  std::list< Rcpp::NumericMatrix > x;
  // Rcpp::IntegerVector choices = {1, 2 ,3};
  Rcpp::IntegerVector choices = Rcpp::seq_len(50);
  for (int i = 0; i < n; ++i) {
    int nc = Rcpp::sample(choices, 1).at(0);
    int nr = Rcpp::sample(choices, 1).at(0);
    Rcpp::NumericVector entries = Rcpp::rbinom(nc * nr, 1, 0.5);
    x.push_back( Rcpp::NumericMatrix(nc, nr, entries.begin()));
  }
  return Rcpp::wrap(x);
}
'
sourceCpp(code = src)

Resulting benchmarks on my computer are:

> library(microbenchmark)
> rcpp_list = function() {
+   set.seed(10);myvec(105)
+ }
> std_list = function() {
+   set.seed(10);myvec2(105)
+ }
> microbenchmark(rcpp_list(), std_list(), times = 1000)
Unit: milliseconds
        expr    min      lq     mean  median      uq
 rcpp_list() 1.8901 1.92535 2.205286 1.96640 2.22380
  std_list() 1.9164 1.95570 2.224941 2.00555 2.32315
    max neval cld
 7.1569  1000   a
 7.1194  1000   a
jfrench
  • 315
  • 1
  • 10

2 Answers2

5

The fundamental issue that Rcpp objects are R objects governed my R's memory management where resizing is expensive: full copies.

So when I have tasks similar to yours where sizes may change, or are unknown, I often work with different data structures -- the STL gives us plenty -- and only convert to R(cpp) at the return step at the end.

The devil in the detail here (as always). Profile, experiment, ...

Edit: And in the narrower sense of "can we return a List of NumericMatrix objects with varying sizes" the answer is of course we can because that is what List objects do. You can also insert other types.

Dirk Eddelbuettel
  • 360,940
  • 56
  • 644
  • 725
  • Dirk, thank you for the prompt response. When you said, "only convert to R(cpp) at the ```return``` step at the end"", I'm assuming you mean using the ```Rcpp::wrap``` function? – jfrench Jan 13 '20 at 21:36
2

As Dirk said, it is of course possible to create a list with matrices of different size. To make it a bit more concrete, here a translation of your R function:

#include <Rcpp.h>
// [[Rcpp::plugins(cpp11)]]
// [[Rcpp::export]]
Rcpp::List myvec(int n) {
    Rcpp::List x(n);
    Rcpp::IntegerVector choices = {1, 2 ,3};
    for (int i = 0; i < n; ++i) {
        int nc = Rcpp::sample(choices, 1).at(0);
        int nr = Rcpp::sample(choices, 1).at(0);
        Rcpp::NumericVector entries = Rcpp::rbinom(nc * nr, 1, 0.5);
        x(i) = Rcpp::NumericMatrix(nc, nr, entries.begin());
    }
    return x;
}

/***R
myvec(2)
*/

The main difference to the R code are the explicitly named vectors choices and entries, which are only implicit in the R code.

Ralf Stubner
  • 26,263
  • 3
  • 40
  • 75
  • Thanks Ralf, was there a specific reason you included ```\\[[Rcpp::plugins(cpp11)]]``` in your code? It seems to run without it, but perhaps there's a benefit I'm not aware of. – jfrench Jan 13 '20 at 21:44
  • @dr_jfrench The initialization of `choices` uses C++11. Often one does not have to request that, though, since many systems nowadays default to C++11. – Ralf Stubner Jan 13 '20 at 22:24
  • That makes sense. I didn't even notice that it was an alternative way to specify the vector. That's simpler than the standard(?) Rcpp approach of something like ```IntegerVector choices = IntegerVector::create(1, 2, 3);``` – jfrench Jan 13 '20 at 22:34
  • 1
    @dr_jfrench, see this post https://stackoverflow.com/a/2236233/4408538. It is called list initialization. There is more info here https://en.cppreference.com/w/cpp/language/list_initialization – Joseph Wood Jan 13 '20 at 23:37