Rcpp version of base-R seq drops values

Question

I wrote an Rcpp version of the base-R seq function.

library(Rcpp)

cppFunction('NumericVector seqC(double x, double y, double by) {

  // length of result vector
  int nRatio = (y - x) / by;
  NumericVector anOut(nRatio + 1);

  // compute sequence
  int n = 0;
  for (double i = x; i <= y; i = i + by) {
    anOut[n] = i;
    n += 1;
  }

  return anOut;
}')

For the following tests, it works just fine.

seqC(1, 11, 2)
[1]  1  3  5  7  9 11

seqC(1, 10, 2)
[1]  1  3  5  7  9 11

Also, it works (sometimes) when passing values with decimal digits rather than integers.

seqC(0.43, 0.45, 0.001)
[1] 0.430 0.431 0.432 0.433 0.434 0.435 0.436 0.437 0.438 0.439 0.440 0.441 0.442 0.443 0.444 0.445 0.446 0.447 0.448 0.449 0.450

However, sometimes the function does not seem to work as expected since the last entry of the sequence is being dropped (or rather, the output vector anOut does not have the proper size), which - according to my rather scarce C++ skills, may be attributed to some kind of rounding errors.

seqC(0.53, 0.59, 0.001)
 [1] 0.530 0.531 0.532 0.533 0.534 0.535 0.536 0.537 0.538 0.539 0.540 0.541 0.542 0.543 0.544 0.545 0.546 0.547 0.548 0.549 0.550 0.551
[23] 0.552 0.553 0.554 0.555 0.556 0.557 0.558 0.559 0.560 0.561 0.562 0.563 0.564 0.565 0.566 0.567 0.568 0.569 0.570 0.571 0.572 0.573
[45] 0.574 0.575 0.576 0.577 0.578 0.579 0.580 0.581 0.582 0.583 0.584 0.585 0.586 0.587 0.588 0.589

In the last example, for instance, the last value (0.590) is missing. Does anyone know how to fix this?

score 4 · Accepted Answer · answered Jul 10 '15 at 15:41

As noted by others, the problem you are experiencing is fundamentally a floating point arithmetic error. A common workaround is to scale your doubles up to sufficiently large integers, perform the task, and then adjust the result to the original scale of your inputs. I took a slightly different approach than @RHertel by letting the amount of scaling (adjust) be determined by the precision of the increment rather than using a fixed amount, but the idea is essentially the same.

#include <Rcpp.h>

struct add_multiple {
  int incr;
  int count;
  add_multiple(int incr)
    : incr(incr), count(0)
    {}
  inline int operator()(int d) {
    return d + incr * count++;
  }
};

// [[Rcpp::export]]
Rcpp::NumericVector rcpp_seq(double from_, double to_, double by_ = 1.0) {
  int adjust = std::pow(10, std::ceil(std::log10(10 / by_)) - 1);
  int from = adjust * from_;
  int to = adjust * to_;
  int by = adjust * by_;

  std::size_t n = ((to - from) / by) + 1;
  Rcpp::IntegerVector res = Rcpp::rep(from, n);
  add_multiple ftor(by);

  std::transform(res.begin(), res.end(), res.begin(), ftor);
  return Rcpp::NumericVector(res) / adjust;
}

/*** R

all.equal(seq(.53, .59, .001), seqC(.53, .59, .001)) &&
  all.equal(seq(.53, .59, .001), rcpp_seq(.53, .59, .001))
# [1] TRUE

all.equal(seq(.53, .54, .000001), seqC(.53, .54, .000001)) &&
  all.equal(seq(.53, .54, .000001), rcpp_seq(.53, .54, .000001))
# [1] TRUE 

microbenchmark::microbenchmark(
  "seq" = seq(.53, .54, .000001),
  "seqC" = seqC(0.53, 0.54, 0.000001),
  "rcpp_seq" = rcpp_seq(0.53, 0.54, 0.000001),
  times = 100L)
Unit: microseconds
     expr        min          lq        mean     median         uq        max neval
      seq    896.190   1015.7940   1167.4708   1132.466   1221.624   1651.571   100
     seqC 212293.307 219527.6590 226933.4329 223384.592 227860.410 398462.561   100
 rcpp_seq    182.848    194.1665    225.4338    227.396    244.942    320.436   100

*/

Where seqC was @RHertel's revised implementation that produced the correct result. FWIW I think the slow performance of this function is mainly do to the use of push_back on the NumericVector type, which the Rcpp developers strongly advise against.

I fully agree that the push_back implementation is not optimal in terms of computing speed. In the meantime, I have further revised the code to allow for a dynamic adjustment. — RHertel, Jul 10 '15 at 15:45
Thanks to both of you (also @RHertel) for the illustrative and imaginative solutions, and also for the information on `push_back`. For my purposes, everything works just fine now! — fdetsch, Jul 13 '15 at 08:50

score 2 · Answer 2 · edited May 23 '17 at 12:08

The "<=" can create difficulties with floating point numbers. This is a variant of the famous question "Why are these numbers not equal?". Moreover, there is a similar issue with the vector length, which in the case of your last example should be 60, but it is actually calculated to be 59. This is most likely due to the conversion to an integer (by casting, i.e., truncation) of a value like 59.999999 or something similar.

It seems to be very difficult to fix these problems, so I have rewritten a considerable part of the code, hoping that now the function operates as required.

The following code should provide correct results for essentially any kind of increasing series (i.e., y > x, by > 0).

cppFunction('NumericVector seqC(double x, double y, double by) {
 NumericVector anOut(1);
 // compute sequence
 double min_by = 1.e-8;
 if (by < min_by) min_by = by/100;
 double i = x + by;
 anOut(0) = x;
 while(i/min_by < y/min_by + 1) { 
  anOut.push_back(i);
  i += by;
 }
return anOut;
}')

Hope this helps. And thanks a lot to @Konrad Rudolph for pointing out mistakes in my previous attempts!

You didn’t actually fix the code, just shift the error. Try `seqC(1, 1.8, 0.5)`. — Konrad Rudolph, Jul 10 '15 at 13:04
(Incidentally, the fix isn’t trivial, otherwise I’d have contributed one. Good on you for trying!) — Konrad Rudolph, Jul 10 '15 at 13:07
Unfortunately I think this will fundamentally never solve the problem. Using a smaller value for the increment of `nRatio1` will merely shift the intervals for which this function will yield the wrong result. For instance, now this fails: `seqC(0.1, 0.199, 0.1)` — Konrad Rudolph, Jul 10 '15 at 13:14

Rcpp version of base-R seq drops values

2 Answers2