4

I am trying to run a routine in R, involving replicating a random experiment. I encountered errors with the dimension of the output, which led me to discover the following peculiarity:

length(replicate(100 - 100*8 / 10, 4))
# 20
length(replicate(100 *(1- 8/ 10), 4))
# 19

As a sanity check, I ran the lines of code to make sure that the expressions in the first argument of replicate produced the same output.

100 - 100*8 / 10
# 20
100 *(1- 8/ 10)
# 20

I was wondering if people are experiencing the same issue. What I really want to know is, why does this happen?

Note: I am aware of the difference between rep and replicate, and my routine requires the latter, not the former.

3 Answers3

4

It is not exactly the same

(100 *(1- 8/ 10)) == 20
#[1] FALSE

(100 - 100*8 / 10) == 20
#[1] TRUE

because

20 - (100 *(1- 8/ 10))
#[1] 3.552714e-15

and the n in ?replicate is an integer

n - integer: the number of replications.

Converting that output to integer floors to 19

as.integer((100 *(1- 8/ 10)))
#[1] 19

floor((100 *(1- 8/ 10)))
#[1] 19

One option is to wrap with ceiling

length(replicate(ceiling(100 *(1- 8/ 10)), 4))
#[1] 20
akrun
  • 874,273
  • 37
  • 540
  • 662
2

It has to do with the fact that floating-point numbers cannot precisely represent all real numbers, because of finite machine precision.

In your case, this issue presents itself as 1- 8/10 not being the same as 0.2:

identical(    8/10, 0.8 )   # TRUE
identical( 1- 8/10, 0.2 )   # FALSE

The root cause is that there is no way to precisely represent 0.8 and 0.2 in binary with a finite number of bits. In the memory, these values are effectively rounded to the nearest 32-bit or 64-bit binary representation. This rounding then cascades through the arithmetic operations.

Artem Sokolov
  • 13,196
  • 4
  • 43
  • 74
0

This is a very common floating-point arithmetic problem present in many programming languages. Some further reading on the topic can be found here.

Katie
  • 1
  • 1
  • 3