Just to add to @Tens great answer.
What seem to be happening are three things
- You have a floating point issue (as mentioned already)
- You are using and old data.table version
- Secondary indices are kicking in while you aren't aware of it
Using your setup
library(data.table)
options(digits = 20) # to see number representation
mround <- function (number, multiple) {
return(multiple * round(number / multiple))
}
DT = data.table(a = mround(112.3, 0.1), b = "B")
So lets address the points above. Since you have a floating point and quoting ?setNumericRounding
Computers cannot represent some floating point numbers (such as 0.6) precisely, using base 2. This leads to unexpected behaviour when joining or grouping columns of type 'numeric'; i.e. 'double
This led data.table
devs to implement the setNumericRounding
which auto rounded floats so a the radix algorithm would behave as expected.
Prior to v1.9.8, setNumericRounding(2)
was the default (hence your first example works), but after some complaints from users for inconsistency on GH (IIRC), since v1.9.8 the default was set back to setNumericRounding(0)
in order to be consistent with data.frame
behavior (see here). So if you'll update your data.table to the latest version, you will see that both data.table
and data.frame
will behave the same for your both examples (and both of your examples will fail).
Compare
setNumericRounding(0)
DT[a == 112.3]
## Empty data.table (0 rows) of 2 cols: a,b
To
setNumericRounding(1)
DT[a == 112.3]
# a b
# 1: 112.30000000000001 B
So you will ask, "what on earth radix algorithm has to do with anything here". So here we reach the third point above- secondary indices (please read this). Lets see what actually happens when you are running you code above
options(datatable.verbose = TRUE)
DT[a == 112.3] # works as expected, i.e returns one row
# Creating new index 'a' <~~~~
# forder took 0 sec
# Starting bmerge ...done in 0 secs
# a b
# 1: 112.30000000000001 B
Lets checks the new secondary indices
indices(DT)
#[1] "a"
when you've ran ==
, data.table set a
as your secondary index in order to perform future operations much more efficiently (this was introduced in v1.9.4, see here). In other words, you performed a binary join on a
instead the usual vector scan like it was prior v1.9.4 (As a side note, this can be disabled by doing options(datatable.auto.index = FALSE)
, in that case, none of your examples will work even with setNumericRounding(1)
unless you will explicitly specify a key using setkey
or the on
argument)
This is probably will also explain why
DT[a == 112.30000 & b == 'B']
doesn't work. You are sub-setting here by two columns and neither secondary indices or binary join don't (automatically) kick-in for an expressions such as == & ==
(yet), hence you did a normal vector scan and setNumericRounding(1)
didn't kick in
Though, you can set the keys manually and make it work, for instance (like I commented under @Tens answer), you can do
setNumericRounding(1) # make sure autoroundings turned on
DT[.(112.3, 'B'), nomatch = 0L, on = .(a, b)]
# Calculated ad hoc index in 0 secs
# Starting bmerge ...done in 0 secs
# a b
# 1: 112.3 B
Or using the old way
setkey(DT, a, b)
DT[.(112.3, 'B'), nomatch = 0L]
# Starting bmerge ...done in 0 secs
# a b
# 1: 112.3 B