Why floating point precision error occurs in data table

Question

I occasionally obtain somewhat odd results - floating point precision error - summing in data table. Here is one of those cases.

DT<-data.table(value=c(100.1, 100.4, 100.41, 100.63))

> DT[,sum(value)]-401.54
[1] -5.684342e-14

With slight changes in value, this error doesn't happen. (100.41->100.42, 100.63->100.62)

DT<-data.table(value=c(100.1, 100.4, 100.42, 100.62))

> DT[,sum(value)]-401.54
[1] 0

Why do you think this happens?

Computers have limitations when it comes to floating-point numbers (aka `double`, `numeric`, `float`). This is a fundamental limitation of computers in general, in how they deal with non-integer numbers. This is not specific to any one programming language. There are some add-on libraries or packages that are much better at arbitrary-precision math, but I believe most main-stream languages (this is relative/subjective, I admit) do not use these by default. Refs: https://stackoverflow.com/q/9508518, https://stackoverflow.com/q/588004, and https://en.wikipedia.org/wiki/IEEE_754 — r2evans, May 01 '21 at 18:18
It's a little astonishing to me that this kind of errors so easily happen in very simple codes. Thanks for your comment and references. — onmaru, May 02 '21 at 15:04
Unfortunately, the fact that it is astonishing does not diminish the fact that this is not uncommon to programming *in general*. This is not an [tag:r] thing, it is not a [tag:data.table] thing, it is something that affects everything that uses IEEE-754 for storage of floating-point numbers in a digital storage medium. Many languages make this less-simple by requiring strict adherence to object class (e.g., `integer` vs `numeric`). The fact that R is class-*permissive* helps reduce entry-cost for many, but can be a liability in *many* cases. — r2evans, May 02 '21 at 15:21

0 Answers0