1

The usual caveats of not much experience with C++ apply. I need to calculate the equivalent of hist(x, breaks=breaks, plot=FALSE)$counts in Rcpp.

I've written the following Rcpp function to calculate frequencies:

#include <Rcpp.h>
using namespace Rcpp;


// [[Rcpp::export]]
NumericVector get_freq(NumericVector x, NumericVector breaks) {
  int nbreaks = breaks.size();
  NumericVector out(nbreaks-1);
  for (int i=0; i<nbreaks-1; i++) {
    LogicalVector temp = (x>breaks(i)) & (x<=breaks(i+1));
    out[i] = sum(temp);
  }

  return(out);
}

The function is called multiple times by another Rcpp function.

The problem is that the run time increases linearly with the length of x:

breaks <- seq(from=0, to=max(x)+1, length.out=101) 

library(microbenchmark)
microbenchmark(get_freq(runif(100, 1, 100), breaks),
               get_freq(runif(1000, 1, 100), breaks),
               get_freq(runif(3000, 1, 100), breaks))

Unit: microseconds
                                  expr      min       lq      mean   median       uq      max neval cld
 get_freq(runif(100, 1, 100), breaks)  176.420  184.611  190.1675  188.415  191.633  313.927   100 a  
 get_freq(runif(1000, 1, 100), breaks) 1700.119 1714.309 1807.4252 1732.302 1809.687 5564.958   100  b 
 get_freq(runif(3000, 1, 100), breaks) 5134.003 5157.701 5342.2800 5177.157 5434.180 9242.844   100   c

get_freq is called multiple times with x typically of length 3000+, and causes a bottleneck in the Rcpp code that is otherwise much faster than the R equivalent.

Any suggestions for ways to improve the speed of get_freq?

Update

After posting this question I realized I should be searching for 'C++ histogram' instead of 'C++ frequency'. I found this answer which I thought did the job. Unfortunately it doesn't.

I need the frequency function to return a vector of fixed length (i.e nbreaks) as above. The linked answer doesn't do this - it only returns counts of observed values

Adrian
  • 684
  • 3
  • 20
  • After posting this question I realized I should be searching for 'C++ histogram' instead of 'C++ frequency'. I found [this](https://stackoverflow.com/a/13661186/2885462) which seems to do the job – Adrian Feb 16 '18 at 21:22
  • It turns out that the linked answer doesn't solve the problem. I need the function to return a vector of fixed length (i.e nbreaks). I have a lot of zero observations which results in vector `out` of different lengths if the linked function is used – Adrian Feb 16 '18 at 23:45

0 Answers0