Rcpp and int64 NA value

Question

How can I pass an NA value from Rcpp to R in a 64 bit vector?

My first approach would be:

// [[Rcpp::export]]                                     
Rcpp::NumericVector foo() {
  Rcpp::NumericVector res(2);

  int64_t val = 1234567890123456789;
  std::memcpy(&(res[0]), &(val), sizeof(double));
  res[1] = NA_REAL;

  res.attr("class") = "integer64";
  return res;
}

But it yields

#> foo()
integer64
[1] 1234567890123456789 9218868437227407266

I need to get

#> foo()
integer64
[1] 1234567890123456789 <NA>

You can't use `NA_REAL` after the `memcpy` because the bit pattern is at that point the one of a `int64`. — Dirk Eddelbuettel, Apr 23 '20 at 12:04
I'd also edit the title. The _default 64 bit NA_ is just `NA_real` which is not what your question is about. — Dirk Eddelbuettel, Apr 23 '20 at 12:10
But the memcpy copies only 64 bits (`sizeof(double)`) right? So `res[0]` gets 64 bits from `val` and then setting `res[1] = ...` uses the next 64 bits. I agree with the outcome, but don't really follow your first comment. — David, Apr 23 '20 at 12:13
I had hoped that `NA_real` uses the same bit pattern that the bit64 package uses (`1000...`). I guess I was wrong there. — David, Apr 23 '20 at 12:15
The whole point is that _the content_ of the vector is then _bit by bit_ an `int64_t` that is merely "parked" inside a `double` vector (aka `NumericVector`). There is no magic logic copy. Jems is doing _all_ the hard work by hand. Including mapping NAs. — Dirk Eddelbuettel, Apr 23 '20 at 12:15
I've programmed for a few years now and I think I can conclude that _hope_ is not a valid long-term strategy. ;-) — Dirk Eddelbuettel, Apr 23 '20 at 12:15
Interestingly, using `std::memcpy(&(res[1]), &(NA_REAL), sizeof(double));` doesn't work... — David, Apr 23 '20 at 12:20
That. Is. What. I. Have. Been. Trying. To. Explain. Look at eg the R source for the existing NA defines. Look at some packages using `int64` and see what they do. — Dirk Eddelbuettel, Apr 23 '20 at 12:22

Dirk Eddelbuettel · Answer 1 · 2020-04-23T18:53:50.253

It's really much, much simpler. We have the behaviour of an int64 in R offered by (several) add-on packages the best of which is bit64 giving us the integer64 S3 class and associated behavior.

And it defines the NA internally as follows:

#define NA_INTEGER64 LLONG_MIN

And that is all that there is. R and its packages are foremost C code, and LLONG_MIN exists there and goes (almost) back all the way to founding fathers.

There are two lessons here. The first is the extension of IEEE defining NaN and Inf for floating point values. R actually goes way beyond and adds NA for each of its types. In pretty much the way above: by reserving one particular bit pattern. (Which, in one case, is the birthday of one of the two original R creators.)

The other is to admire the metric ton of work Jens did with the bit64 package and all the required conversion and operator functions. Seamlessly converting all possibly values, including NA, NaN, Inf, ... is no small task.

And it is a neat topic that not too many people know. I am glad you asked the question because we now have a record here.

score 6 · Accepted Answer · answered Apr 23 '20 at 11:47

Alright, I think I found an answer... (not beautiful, but working).

Short Answer:

// [[Rcpp::export]]                                     
Rcpp::NumericVector foo() {
  Rcpp::NumericVector res(2);

  int64_t val = 1234567890123456789;
  std::memcpy(&(res[0]), &(val), sizeof(double));

  # This is the magic:
  int64_t v = 1ULL << 63;
  std::memcpy(&(res[1]), &(v), sizeof(double));

  res.attr("class") = "integer64";
  return res;
}

which results in

#> foo()
integer64
[1] 1234567890123456789 <NA>

Longer Answer

Inspecting how bit64 stores an NA

# the last value is the max value of a 64 bit number
a <- bit64::as.integer64(c(1, 2, NA, 9223372036854775807))
a
#> integer64
#> [1] 1    2    <NA> <NA>
bit64::as.bitstring(a[3])
#> [1] "1000000000000000000000000000000000000000000000000000000000000000"
bit64::as.bitstring(a[4])
#> [1] "1000000000000000000000000000000000000000000000000000000000000000"

^{Created on 2020-04-23 by the reprex package (v0.3.0)}

we see that it is a 10000.... This can be recreated in Rcpp with int64_t val = 1ULL << 63;. Using memcpy() instead of a simple assign with = ensures that no bits are changed!

Yes. If you look at some source packages you will see corresponding `#define` statement to declare one bit pattern (often either `min` or `max`) to be the NA value. — Dirk Eddelbuettel, Apr 23 '20 at 12:05

Rcpp and int64 NA value

2 Answers2

Short Answer:

Longer Answer