1

Am unable to produce expected results with tableB[ tableA] on my data. But the same works fine on simple example data. Please decode what am I doing wrong.


> tableA <- data.table(col1 = c( 1.0, 1.1, 1.2, 1.3, 1.4, 1.5, 1.6,        1.7, 1.8, 1.9), key = 'col1')

> tableA
    col1
 1:  1.0
 2:  1.1
 3:  1.2
 4:  1.3
 5:  1.4
 6:  1.5
 7:  1.6
 8:  1.7
 9:  1.8
10:  1.9

> tableB <- data.table(col1 = c( 1.0, 1.2, 1.5, 1.9), col2 = c( "A", "B", "C", "D"), col3 = c( "AA", "BB", "CC", "DD"), key = 'col1')

> tableB
   col1 col2 col3
1:  1.0    A   AA
2:  1.2    B   BB
3:  1.5    C   CC
4:  1.9    D   DD

> tableA <- tableB[ tableA]

> tableA
    col1 col2 col3
 1:  1.0    A   AA
 2:  1.1 <NA> <NA>
 3:  1.2    B   BB
 4:  1.3 <NA> <NA>
 5:  1.4 <NA> <NA>
 6:  1.5    C   CC
 7:  1.6 <NA> <NA>
 8:  1.7 <NA> <NA>
 9:  1.8 <NA> <NA>
10:  1.9    D   DD

which is what expected. But..

> tableA <- data.table( V1 = seq( 1, by = 0.1, length.out = 20), key = 'V1')

> tableA
     V1
 1: 1.0
 2: 1.1
 3: 1.2
 4: 1.3
 5: 1.4
 6: 1.5
 7: 1.6
 8: 1.7
 9: 1.8
10: 1.9
11: 2.0
12: 2.1
13: 2.2
14: 2.3
15: 2.4
16: 2.5
17: 2.6
18: 2.7
19: 2.8
20: 2.9

> tableB <- fread( file = "C:/Users/Vj/Desktop/data backup/ch1.csv", header = FALSE, sep = ",", key = 'V1')

> tableB
     V1      V2      V3
 1: 1.0 0.90812 1.17372
 2: 1.1 0.91312 1.16307
 3: 1.2 0.91783 1.16928
 4: 1.3 0.93506 1.16695
 5: 1.5 0.91891 1.16016
 6: 1.6 0.90138 1.17475
 7: 1.7 0.90008 1.17295
 8: 1.9 0.90542 1.14948
 9: 2.0 0.91563 1.16735
10: 2.2 0.91167 1.16976
11: 2.3 0.90378 1.17025
12: 2.4 0.90938 1.17165
13: 2.5 0.88599 1.17586
14: 2.6 0.90107 1.18052
15: 2.7 0.90451 1.14228
16: 2.9 0.90673 1.16695

> tableA <- tableB[ tableA]

> tableA
     V1      V2      V3
 1: 1.0 0.90812 1.17372
 2: 1.1 0.91312 1.16307
 3: 1.2 0.91783 1.16928
 4: 1.3 0.93506 1.16695
 5: 1.4      NA      NA
 6: 1.5 0.91891 1.16016
 7: 1.6 0.90138 1.17475
 8: 1.7      NA      NA
 9: 1.8      NA      NA
10: 1.9 0.90542 1.14948
11: 2.0 0.91563 1.16735
12: 2.1      NA      NA
13: 2.2 0.91167 1.16976
14: 2.3 0.90378 1.17025
15: 2.4      NA      NA
16: 2.5 0.88599 1.17586
17: 2.6 0.90107 1.18052
18: 2.7 0.90451 1.14228
19: 2.8      NA      NA
20: 2.9      NA      NA

Its not any 'typo'. I can reproduce the same exact results again and again. Any insight would be valuable.

There are no errors. I expect 16 valid rows and 4 rows of NA. But, am getting only 13 valid rows and 7 rows of NA.

VjSwamy
  • 79
  • 4

1 Answers1

1

This is due to a floating point error in the decimal values in V1. This is not a R (or data.table) issue, but it's just the way computers work with decimal numbers.
read this: Why are these numbers not equal? for further information.

To prevent an 'error' like this, a solution is to set the join-columns to character.

tableA[, V1 := as.character(V1)]
tableB[, V1 := as.character(V1)]

tableB[tableA, on = .(V1)]

will give the expected results

     V1      V2      V3
 1:   1 0.90812 1.17372
 2: 1.1 0.91312 1.16307
 3: 1.2 0.91783 1.16928
 4: 1.3 0.93506 1.16695
 5: 1.4      NA      NA
 6: 1.5 0.91891 1.16016
 7: 1.6 0.90138 1.17475
 8: 1.7 0.90008 1.17295
 9: 1.8      NA      NA
10: 1.9 0.90542 1.14948
11:   2 0.91563 1.16735
12: 2.1      NA      NA
13: 2.2 0.91167 1.16976
14: 2.3 0.90378 1.17025
15: 2.4 0.90938 1.17165
16: 2.5 0.88599 1.17586
17: 2.6 0.90107 1.18052
18: 2.7 0.90451 1.14228
19: 2.8      NA      NA
20: 2.9 0.90673 1.16695
Wimpel
  • 26,031
  • 1
  • 20
  • 37
  • VjSwamy: glad it worked. If a given answer suits your needs, you can accept it as an answer. – Wimpel Jun 25 '19 at 15:55