Obvious `merge` leads to NA's. How to debug differences between data.tables?

Question

I have two data.table and I am trying to merge them together. The first one has the columns: code offers The second one has the columns: code offers nb_offers I am trying to add the third column to the first data.table through a merge.

The minimal files are available here:

dt1 = https://ufile.io/vjeju
dt2 = https://ufile.io/l61jc

My issue is that for some rows, the values aren't filled while they're obviously here.

A working example:

> library(data.table)
> dt1 <- data.table(readRDS("dt1.rds"))
> dt2 <- data.table(readRDS("dt2.rds"))

> dt1[offers == "0.05"]
    code offers
1: 24428   0.05

> dt2[offers == "0.05"]
    code offers nb_offers
1: 24428   0.05         5

> merge(dt1[offers == "0.05"], dt2[offers == "0.05"], 
        by = c("code", "offers"))
    code offers nb_offers
1: 24428   0.05         5

Let's change things very slightly:

> dt1[offers == "0.06"]
    code offers
1: 24428   0.06

> dt2[offers == "0.06"]
    code offers nb_offers
1: 24428   0.06         3

> merge(dt1[offers == "0.06"], dt2[offers == "0.06"], 
        by = c("code", "offers"))
Empty data.table (0 rows) of 3 cols: code,offers,nb_offers

This makes absolutely no sense to me.

I tried to replicate the error, but if I create the data.tables from scratch, the error doesn't occurr:

> tmp1 <- data.table(code = "24428", offers = 0.06)
> tmp2 <- data.table(code = "24428", offers = 0.06, nb_offers = 3)
> setkey(tmp2, code)
> merge(tmp1, tmp2, by = c("code", "offers"))
    code offers nb_offers
1: 24428   0.06         3
> all.equal(tmp1, dt1[offers == "0.06"])
[1] TRUE
> all.equal(tmp2, dt2[offers == "0.06"])
[1] TRUE

I understand I can do what I want to do in other ways, but I'd like to understand what's going on here.

Thank you.

I can't download the files right now, but the first thing i'll check is floating point precision — Jean, Aug 01 '17 at 09:42
From your files, `dt1$offers-0.06` gives `[1] -1.000000e-02 6.938894e-18`, so it is a floating point precision problem. For some reason doing the same with `dt2` gives exactly zero for the second value. — Andrew Gustar, Aug 01 '17 at 09:51
Yep, that works. I applied `round(x, 2)` to both columns and now the merge works well. That's still super weird considering it works for 0.05 (and many others) but not 0.06 (and many others) without clear patterns in what works and what doesn't. Edit: dt1 comes from text that's been converted with `as.numeric` and dt2 comes from `seq` — Charles, Aug 01 '17 at 09:57
Have a read of [this post](https://stackoverflow.com/questions/9508518/why-are-these-numbers-not-equal). Conversion from decimal to binary representations is tricky. — lmo, Aug 01 '17 at 12:03

Obvious `merge` leads to NA's. How to debug differences between data.tables?

0 Answers0