4

I'm looking for a way to consistently ignore small differences between floating point numbers in R (these are double precision floating points as per IEC 60559), by using base R tools and without resorting to C or C++. In other words, I would like to "round" the significand portion of the double precision floating point numbers such that things like this return TRUE instead of FALSE:

1.45 - .55 == 2.45 - 1.55
## [1] FALSE

Something like:

round_significand(1.45 - .55, bits=48) == round_significand(2.45 - 1.55, bits=48)
## [1] TRUE

A simple round doesn't work because the level to which we need to round depends on the magnitude of the number.

data.table does something of the sort internally, from ?setNumericRounding:

Computers cannot represent some floating point numbers (such as 0.6) precisely, using base 2. This leads to unexpected behaviour when joining or grouping columns of type 'numeric'; i.e. 'double', see example below. In cases where this is undesirable, data.table allows rounding such data up to approximately 11 s.f. which is plenty of digits for many cases. This is achieved by rounding the last 2 bytes off the significand. Other possible values are 1 byte rounding, or no rounding (full precision, default).

I'm working on a hack implementation that scales everything to be a decimal number x such that floor(log10(x)) == 1 and rounds that, e.g.:

rnd_sig <- function(x, precision=10) {
  exp <- floor(log10(abs(x)))
  round(x * 10 ^ (-exp), precision) / 10 ^ (-exp)
}

but I don't know enough about floating point numbers to be sure this is safe (or when it is safe, and not).

BrodieG
  • 51,669
  • 9
  • 93
  • 146
  • 2
    You can check this post: https://stackoverflow.com/questions/9508518/why-are-these-floating-point-numbers-not-equal?utm_medium=organic&utm_source=google_rich_qa&utm_campaign=google_rich_qa – Daniel Gimenez May 28 '18 at 14:56
  • 2
    Maybe `signif`? `signif(1.45 - .55, 2) == signif(2.45 - 1.55, 2)` – lmo May 28 '18 at 15:01
  • @DanielGimenez thanks, that's a great reference; I'm surprised I didn't find it. However, it seems to me that won't really work well for large numbers (e.g. `abs(a * 10^10 - b * 10^20) < .Machine$double.eps ^ 0.5`). – BrodieG May 28 '18 at 15:03
  • @lmo that might be the ticket. Perhaps not exactly in terms of bit rounding, but probably close enough for my purposes. – BrodieG May 28 '18 at 15:05
  • 1
    Can you explain why you are testing for equality after using operations that approximate results? What is the specific situation you are working with? Attempting to test for a property that is not expected to be present in the computed results may not be a good idea. – Eric Postpischil May 28 '18 at 18:32
  • As in my example, in my application when two numbers are equal except for double floating precision issues, they are probably equal. I need them recognized as such so they can be grouped together. Simple arithmetic states "these numbers are different", when the reality is "these numbers may or may not be different, we don't know given the precision available to us". I'm willing to assume the numbers are equal in that case. – BrodieG May 28 '18 at 18:35
  • Whatever the rounding algorithm, two numbers arbitrarily close but one inferior to some limit, and the other superior, will be rounded down and up. For example, round(0.49999) and round(0.50001). So the strategy round_algorithm(a) == round_algorithm(b) is doomed to fail in some cases... – aka.nice May 29 '18 at 12:05
  • @aka.nice sure, but keep in mind I'm explicitly looking at differences that only happen due to double precision imprecision. So if you round several orders of magnitude above that, and you end up with different numbers, the differences are definitely "real" in as much as they can't possibly be the result of double precision imprecision. – BrodieG May 29 '18 at 12:49
  • @lmo will you add your suggestion as an answer so I can accept? – BrodieG May 30 '18 at 00:55
  • @BrodieG added a brief answer. – lmo Jun 03 '18 at 11:48

3 Answers3

5

There is no general answer for how much a result computed with floating-point may differ from the exact mathematical result. In general, the final error of a sequence of computations can range from zero to infinity (and may also produce Not-a-Number results when there is an exactly mathematical result or may produce a numeric result when there is no defined mathematical result). Therefore, determining what tolerance to use to classify whether two computed results are equal or not requires a problem-specific solution: One must analyze the calculations and numbers involved in the specific problem and determine bounds on the possible error or weigh the specific advantages and disadvantages of accepting incorrect classifications.

The study of errors that result in numerical computation is numerical analysis. It is a broad field addressed by many books. No simple answer exists.

In simple situations, it may be possible to determine bounds on the errors and to show that these bounds are less than the differences between results that are known to be different. In other words, given a computation that ideally would produce results a and b but actually produces a and b, it might be possible to show that there is some bound E on the error such that |ab| < E if and only if a equals b. However, it is not possible to answer this question without knowing what computations are performed and, possibly, knowing what the domain of input values is.

Eric Postpischil
  • 195,579
  • 13
  • 168
  • 312
  • Thanks Eric (+1). I'm willing to be sloppy about this so the fact that I can't know for sure the degree of precision I actually have in my number does not dissuade me from wanting to reduce the precision by some amount I hope will be enough in most cases. – BrodieG May 30 '18 at 12:59
2

One possible solution is to use signif, a function related to round and included in the same help file. The help file from ?signif, says

signif rounds the values in its first argument to the specified number of significant digits.

Whereas

round rounds the values in its first argument to the specified number of decimal places (default 0).

So it appears that signif may be more closely related to your problem.

lmo
  • 37,904
  • 9
  • 56
  • 69
1

Warning: this is not an direct answer to the question.

I find the following two functions useful. They allow me to compare doubles to a given degree of precision.

are.equal <- function(x, y, tol = .Machine$double.eps^0.5) abs(x - y) < tol
is.zero <- function(x, tol = .Machine$double.eps^0.5) abs(x) < tol

are.equal(1.45 - 0.55, 2.45 - 1.55)
#[1] TRUE

is.zero(1.45 - 0.55 - (2.45 - 1.55))
#[1] TRUE
Rui Barradas
  • 70,273
  • 8
  • 34
  • 66
  • Thanks! unfortunately this doesn't address the issue with larger numbers (see my comment above). – BrodieG May 28 '18 at 15:05