0

I need to speed up data processing in R through C++. I already have my C++ code and it basically reads from txt file what R should pass. Since I need R for my analysis, I want to integrate my C++ code in R.

What the C++ code needs is a (large) dataframe (for which I use std::vector< std::vector> >) and a set of parameters, so I am thinking about passing parameters through .Call interface and then deal with data in the following way:

  • R: write data in txt file with a given encoding

  • C++: read from txt, do what I need to do and write the result in a txt (which is still a dataset -> std::vector)

  • R: read the result from txt

This would avoid me to rewrite part of the code. The possible problem/bottleneck is in reading/writing, do you believe it is a real problem?

Otherwise, as an alternative, is it reasonable to copy all my data in C++ structures through .Call interface?

Thank you.

hrbrmstr
  • 77,368
  • 11
  • 139
  • 205
enovap
  • 47
  • 5
  • 2
    This is likely to get closed without more details. We have no idea what "large" means to you. Why are you using `std::vector< std::vector> >` vs `Rcpp::DataFrame`? Do you process the whole data frame in C++ or just certain columns? What would this function return back to R? Unless you provide example code there's little folks can do but opine. – hrbrmstr May 04 '18 at 10:34
  • 1
    You should look at [Rcpp](https://cran.r-project.org/package=Rcpp)! – Ralf Stubner May 04 '18 at 12:14

2 Answers2

1

You could start with the very simple DataFrame example in the RcppExamples package:

#include <Rcpp.h>
using namespace Rcpp;

// [[Rcpp::export]]
List DataFrameExample(const DataFrame & DF) {

    // access each column by name
    IntegerVector a = DF["a"];
    CharacterVector b = DF["b"];
    DateVector c = DF["c"];

    // do something
    a[2] = 42;
    b[1] = "foo";
    c[0] = c[0] + 7; // move up a week

    // create a new data frame
    DataFrame NDF = DataFrame::create(Named("a")=a,
                                      Named("b")=b,
                                      Named("c")=c);

    // and return old and new in list
    return List::create(Named("origDataFrame") = DF,
                        Named("newDataFrame") = NDF);
}

You can assign vectors (from either Rcpp or the STL) and matrices (again, either from Rcpp, or if you prefer nested STL vectors). And then you also have Eigen and Armadillo via RcppEigen and RcppArmadillo. And on and on -- there are over 1350 packages on CRAN you could study. And a large set of ready-to-run examples are at the Rcpp Gallery.

Dirk Eddelbuettel
  • 360,940
  • 56
  • 644
  • 725
0

Reading and writing large datasets back and forth is not an optimal solution for passing the data between R and your C++ code. Depending on how long your C++ code executes this might or might not be the worst bottleneck in your code, but this approach should be avoided.

You can look a at the following solution to pass a data.frame (or data.table) object: Passing a `data.table` to c++ functions using `Rcpp` and/or `RcppArmadillo`

As for passing additional parameters, the solution will depend on what kind of parameters we are talking about. If those are just numeric values, then you can pass them directly to C++ (see High performance functions with Rcpp: http://adv-r.had.co.nz/Rcpp.html).

Katia
  • 3,784
  • 1
  • 14
  • 27