I have found a lot of information on this online, but I haven't been able to find anything that exactly answers my question. My issue does not have to do with the presentation of the numbers, but instead the calculations and storage underneath the presentation.
The issue is with floating points in R. I wish to truncate them; however, I want to make sure I am storing them correctly after they are truncated.
The problem is: I have a dataset where I am trying to compare the difference between different numbers to any threshold I would like (exact to 2 decimal places - i.e. 0.00, 0.05, and 1.00.). I want to make sure when I test the difference to exactly zero that it is testing exactly the correct difference and there is not a storage problem going on behind that I am unaware of.
So far, I have tried:
(1) round
(and testing against 0, and very small values like 1e-10
)
(2) multiplying by 100 and as.integer
These calculations come up with different answers when I calculate the percentage of observations that have a difference greater than my chosen threshold in my dataset.
In short, it would be great to know how to best store the number to get the most accurate result when calculating whether or not the difference is actually 0.
Note: This needs to work for large datasets.
Example:
dt <-
data.table(d = c(0.00, 988.36, 0.00, 2031.46, 0.00),
c = c(0.00, 30.00, 0.00, 2031.46, 0.00),
n = c("a", "b", "a", "a", "b"))
dt[, diff := d - c]
dt[, abs_diff := abs(diff)]
dt[, pct_diff := mean(abs_diff == 0, na.rm = TRUE), by = "n"]
The last step is the problem, as I continuously get different numbers for pct_diff
based on the threshold. (For example, mean(abs_diff <= 1e-10)
and mean(abs_diff <= 1e-15)
give me different answers).