My Rcpp code is occasionally failing (SEGFAULT, etc.) for reasons I don't understand. The code creates a large data.frame, and then tries to obtain a subset of this data.frame by calling the R subset function, [.data.frame
), from within the same method that is creating the frame. A very simplified version of it is shown below:
library(Rcpp)
src <- '// R function to subset data.frame - what will be called to subset
DataFrame test() {
Function subsetinR("[.data.frame");
// Make a dataframe in Rcpp to subset
size_t n = 100;
auto df = DataFrame::create(Named("a") = std::vector<double> (n, 2.0),
Named("b") = std::vector<double> (n, 4.0));
// Now make a vector to subset with
LogicalVector filter = LogicalVector::create(n, TRUE);
for (size_t i =0; i < n; i++) {
if (i % 2 == 0) filter[i] = FALSE;
}
// Subset, here is where it fails!
df = subsetinR(df, filter, R_MissingArg);
return df;
}'
fun <- cppFunction(plugins=c("cpp11"), src, verbose = TRUE, depends="Rcpp")
fun()
However, while this occasionally works, it will other times it fails with the following error:
*** caught segfault ***
address 0x7ff700000030, cause 'memory not mapped'`
Anyone know what is going wrong?
Note: This is not a duplicate. I have seen other stack overflow answers which create vectors by exploiting subsetting on each vector, e.g.
// Next up, create a new DataFrame Object with selected rows subset.
return Rcpp::DataFrame::create(Rcpp::Named("val1") = val1[idx],
Rcpp::Named("val2") = val2[idx],
Rcpp::Named("val3") = val3[idx],
Rcpp::Named("val3") = val4[idx]
);
However, I am explicitly looking to avoid the repeated [idx]
subsetting, as the idx is not known when the data.frame is constructed (it is only known at the end), and I am hoping to find a way that doesn't involve repeatedly invoking that. If it's possible to transform the data.frame at the end with one go though, that would work just fine.