0

On R 3.5.2 and while trying to convert string to double, the output is wrong,

# this is just to avoid scientific notation. 
options(scipen=999)

temp <- "2671768011130961018032700237"
as.numeric(temp)
# and the output is, 
2671768011130961013860062868

as.double(temp)
# and the output is 
2671768011130961013860062868

as.numeric(temp) == 2671768011130961018032700237
# this returns true

print(.Machine$double.xmax)
# and to check the overflow case, this prints out 179769313486231570838400602864442228000008602082842266064064680402680408280648240046204888888288080622822420842246006644866884860462806420066668022046626024066662068886808602862886866800048228686262462640668044406484606206082824406288200264266406808068464046840608044222802268424008466606886862062820068082688

Can't think of anything which could have cause this behaviour. Any help is appreciated.

  • This is a double precision issue. Double precise numbers have roughly 15 digits of accuracy. Beyond that, there is no guarantee that arithmetic will be accurate to your most significant digit. – Joseph Wood Feb 13 '19 at 14:25
  • You don't even need the double converstion: `> 2671768011130961018032700237 [1] 2671768011130961013860062868` – iod Feb 13 '19 at 14:26
  • Indeed. I don't understand what that strange xmax value is because that would not fit a double. – Jurgen Vinju Feb 13 '19 at 14:28
  • 1
    If you need better than double precision, use high precision numbers: `library(Rmpfr); mpfr(temp, 256)` – Roland Feb 13 '19 at 14:29
  • 1
    You should read this section from Wikipedia: [IEEE 754 double-precision binary floating-point format: binary64](https://en.wikipedia.org/wiki/Double-precision_floating-point_format#IEEE_754_double-precision_binary_floating-point_format:_binary64). Basically, you have 53 bits of precision and when converted to base 10 gives: log10(2^53 - 1) = 15.95459 – Joseph Wood Feb 13 '19 at 14:29
  • 1
    The problem here is with `options(sicpen = 999)` giving the illusion that you are getting those digits accurately. – Joseph Wood Feb 13 '19 at 14:30

1 Answers1

1

First, notice that the following equality comparison is also returning TRUE:

as.numeric(temp) == 2671768011130961013860062868
[1] TRUE

The short answer here is the double/float precision in R, and most other programming languages, is not exact. Both of the following two comparisons return TRUE:

as.numeric(temp) == 2671768011130961018032700237
as.numeric(temp) == 2671768011130961013860062868

What is likely happening here is that R is only comparing up to a certain significant figure, and both numbers on the RHS are close enough that the comparison appears TRUE in both cases.

If you are looking for a "fix" here, then what you need to use is an exact numeric type, such as integer. The problem with integer is that your values are too large to be stored, so what you really need is the equivalent of long in other languages such as Java. Base R does not seem to support this, but if you read here then you might find a few custom R packages which support things like int64.

Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
  • 2
    Fun game: `2671768011130961018032700237+500000==2671768011130961018032700237 [1] TRUE` – iod Feb 13 '19 at 14:29
  • Just to clarify. R uses `long int` for integers as defined in `C` which uses 4 bytes. The equivalent of what you are referring to (I.e. `long` in Java) would be a `long long int` or `int64_t` as defined in ``. Also, this still would not help in this case as the numbers are well beyond the precision of these data types (roughly 19 decimal digits). The only sensible solution would be to use `gmp` or `Rmpfr`. – Joseph Wood Feb 13 '19 at 15:35
  • 1
    @JosephWood Well that was a `long` comment, but thanks for clarifying. – Tim Biegeleisen Feb 13 '19 at 15:48