1

I have found a lot of information on this online, but I haven't been able to find anything that exactly answers my question. My issue does not have to do with the presentation of the numbers, but instead the calculations and storage underneath the presentation.

The issue is with floating points in R. I wish to truncate them; however, I want to make sure I am storing them correctly after they are truncated.

The problem is: I have a dataset where I am trying to compare the difference between different numbers to any threshold I would like (exact to 2 decimal places - i.e. 0.00, 0.05, and 1.00.). I want to make sure when I test the difference to exactly zero that it is testing exactly the correct difference and there is not a storage problem going on behind that I am unaware of.

So far, I have tried:

(1) round (and testing against 0, and very small values like 1e-10)

(2) multiplying by 100 and as.integer

These calculations come up with different answers when I calculate the percentage of observations that have a difference greater than my chosen threshold in my dataset.

In short, it would be great to know how to best store the number to get the most accurate result when calculating whether or not the difference is actually 0.

Note: This needs to work for large datasets.

Example:

dt <- 
      data.table(d = c(0.00, 988.36, 0.00, 2031.46, 0.00), 
                 c = c(0.00, 30.00, 0.00, 2031.46, 0.00), 
                 n = c("a", "b", "a", "a", "b"))

dt[, diff := d - c]

dt[, abs_diff := abs(diff)]

dt[, pct_diff := mean(abs_diff == 0, na.rm = TRUE), by = "n"]

The last step is the problem, as I continuously get different numbers for pct_diff based on the threshold. (For example, mean(abs_diff <= 1e-10) and mean(abs_diff <= 1e-15) give me different answers).

dc3
  • 188
  • 1
  • 12
  • You should provide some form of [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) to make it more clear what's going on. – MrFlick Mar 03 '16 at 15:40
  • Can you make some examples and the output you want? I don't fully get neither what you mean by `exactly 0` nor your rounding process. When a number is stored correctly? What is the correct difference? – nicola Mar 03 '16 at 15:42
  • Sure - I can add an example. – dc3 Mar 03 '16 at 15:43
  • The results in the example you posted look regular to me. What's wrong with them? Also, you shouldn't compare floating point values. Do your data really care if the difference is 10^-10 or exactly 0? – nicola Mar 03 '16 at 16:22
  • I don't care if the data are minutely different - I am just trying to capture the differences correctly. I would like to compare them to the 0.01 value. I am just trying to capture this in the most accurate way - I keep getting different answers based on the methods I use. – dc3 Mar 03 '16 at 16:26
  • `abs(x - 0.01) < tol` is how you test this for floating point numbers. `col` can be chosen after consulting `help(".Machine")`. – Roland Mar 03 '16 at 17:50

1 Answers1

2

Rounded numbers are stored as numeric, i.e., floating point numbers:

class(round(1.1))
#[1] "numeric"

class(floor(1.1))
##[1] "numeric"

It seems like your are looking for packages that support arbitrary precision numbers, such as package Rmpfr.

Roland
  • 127,288
  • 10
  • 191
  • 288