R the number of significant digits leads to unexpected results of inequality using eval and parse text

Question

I am working on boolean rules related to terminal node assignment for CART-like trees related to my work (http://web.ccs.miami.edu/~hishwaran/ishwaran.html)

I have noticed problematic behavior in evaluating inequalities of character strings using eval and parse of text. The issue has to do with how R evaluates the internal representation of a number.

Here's an example involving the number pi. I want to check if a vector (which I call x) is less than or equal to pi.

> pi
> [1] 3.141593
> rule = paste0("x <= ", pi)
> rule
> [1] "x <= 3.14159265358979"

This rule checks whether the object x is less than pi where pi is represented to 14 digits. Now I will assign x to the values 1,2,3 and pi

> x = c(1,2,3,pi)

Here's what x is up to 15 digits

> print(x, digits=15)
> [1] 1.00000000000000 2.00000000000000 3.00000000000000 3.14159265358979

Now let's evaluate this

> eval(parse(text = rule))
> [1] TRUE TRUE TRUE FALSE

Whooaaaaa, it looks like pi is not less than or equal to pi. Right?

But now if I hard-code x to pi to 14 digits, it works:

> x = c(1,2,3,3.14159265358979)
> eval(parse(text = rule)) [1] TRUE TRUE TRUE TRUE

Obviously in the first case, the internal representation for pi has many digits and so when R evaluates the expression, it is greater than the float representation and it returns FALSE. In the second case it compares two floats, so the result is true.

However, how to avoid this happening? I really need the first evaluation to come back true because I am automating this process for rule based inference and I cannot hard code a value (here this being pi) each time.

One solution I use is to add a small tolerance value.

> tol = sqrt(.Machine$double.eps)
> rule = paste0("x <= ", pi + tol)
> x = c(1,2,3,pi)
> eval(parse(text = rule))
> [1] TRUE TRUE TRUE TRUE

However, this seems like an ugly solution.

Any comments and suggestions are greatly appreciated!

Floating point comparisons is not accurate. Read https://stackoverflow.com/questions/9508518/why-are-these-numbers-not-equal — Ronak Shah, Apr 28 '21 at 12:00
The whole approach of converting to character using paste and converting back using eval is pretty ugly TBH. Why not just round your integers and then do the comparison? — dash2, Apr 28 '21 at 13:46
It's not possible due to the nature of my work. I need to work with very complex strings of boolean operators. — H. Ishwaran, Apr 28 '21 at 14:41
Adding the small tolerance does not make it uglier than it already is. I'd go for that. — Sirius, Apr 28 '21 at 16:07

Sirius · Answer 1 · 2021-04-28T16:31:00.247

2

You could just go via the pi name or via a function instead, to prevent pi from getting stringified (which is your first problem here)


rule  <-  "x <= pi"
x  <-  c(1,2,3,pi)

eval(parse(text = rule)) ## All TRUE

## another way might be to throw stuff you need uneval'ed into a function or a block:

my_pi <- function() {
    pi
}

rule  <-  "x <= my_pi()"
eval(parse(text = rule)) ## All TRUE

You still will suffer from the usual floating point issues, but imprecise stringification won't be your problem anymore.

Here's why your approach didn't work:


> print( pi, digits=20 )
[1] 3.141592653589793116
> print( eval(parse(text=pi)), digits=20 )
[1] 3.1415926535897900074

The stringified pi is less than R's pi by a good margin.

The paste manual says it uses as.character to convert numbers to strings. Which in turn says it's using 15 significant digits which is what you are observing.

edited Apr 28 '21 at 16:31

answered Apr 28 '21 at 11:59

Sirius

5,224
2
14
21

I understand there is a float issue (please read the entire question). What I want to know, is how to avoid this as I am automating this process. – H. Ishwaran Apr 28 '21 at 14:39
see my suggestion on going via a function to prevent pi from getting stringified (and ruined) prematurely – Sirius Apr 28 '21 at 16:24
appreciate the help – H. Ishwaran Apr 28 '21 at 17:19

R the number of significant digits leads to unexpected results of inequality using eval and parse text

1 Answers1

Here's why your approach didn't work: