4

R seems to support an efficient NA value in floating point arrays. How does it represent it internally?

My (perhaps flawed) understanding is that modern CPUs can carry out floating point calculations in hardware, including efficient handling of Inf, -Inf and NaN values. How does NA fit into this, and how is it implemented without compromising performance?

smci
  • 32,567
  • 20
  • 113
  • 146
Szabolcs
  • 24,728
  • 9
  • 85
  • 174
  • For integers: https://stackoverflow.com/questions/56507748/internal-representation-of-int-na – Jimbo Feb 20 '22 at 22:56

1 Answers1

3

R uses NaN values as defined for IEEE floats to represent NA_real_, Inf and NA. We can use a simple C++ function to make this explicit:

Rcpp::cppFunction('void print_hex(double x) {
    uint64_t y;
    static_assert(sizeof x == sizeof y, "Size does not match!");
    std::memcpy(&y, &x, sizeof y);
    Rcpp::Rcout << std::hex << y << std::endl;
}', plugins = "cpp11", includes = "#include <cstdint>")
print_hex(NA_real_)
#> 7ff80000000007a2
print_hex(Inf)
#> 7ff0000000000000
print_hex(-Inf)
#> fff0000000000000

The exponent (second till 13. bit) is all one. This is the definition of an IEEE NaN. But while for Inf the mantissa is all zero, this is not the case for NA_real_. Here some source code references.

Ralf Stubner
  • 26,263
  • 3
  • 40
  • 75