6

Caveat: novice. I have several data.tables with millions of rows each, variables are mostly dates and factors. I was using rbindlist() to combine them because. Yesterday, after breaking up the tables into smaller pieces vertically (instead of the current horizontal splicing), I was trying to understand rbind better (especially with fill = TRUE) and also tried bind_rows() and then tried to verify the results but identical() returned FALSE.

library(data.table)
library(dplyr)
DT1 <- data.table(a=1, b=2)
DT2 <- data.table(a=4, b=3)
DT_bindrows <- bind_rows(DT1,DT2)
DT_rbind <- rbind(DT1,DT2)
identical(DT_bindrows,DT_rbind)
 # [1] FALSE

Visually inspecting the results from bind_rows() and rbind() says they are indeed identical. I read this and this (from where I adapted the example). My question: (a) what am I missing, and (b) if the number, names, and order of my columns is the same, should I be concerned that identical() = FALSE?

armipunk
  • 458
  • 2
  • 13

1 Answers1

6

The identical checks for attributes which are not the same. With all.equal, there is an option not to check the attributes (check.attributes)

all.equal(DT_bindrows, DT_rbind, check.attributes = FALSE)
#[1] TRUE

If we check the str of both the datasets, it becomes clear

str(DT_bindrows)
#Classes ‘data.table’ and 'data.frame': 2 obs. of  2 #variables:
# $ a: num  1 4
# $ b: num  2 3
str(DT_rbind)
#Classes ‘data.table’ and 'data.frame': 2 obs. of  2 #variables:
# $ a: num  1 4
# $ b: num  2 3
# - attr(*, ".internal.selfref")=<externalptr> # reference attribute 

By assigning the attribute to NULL, the identical returns TRUE

attr(DT_rbind, ".internal.selfref") <- NULL
identical(DT_bindrows, DT_rbind)
#[1] TRUE
akrun
  • 874,273
  • 37
  • 540
  • 662
  • 1
    I did not know of all.equal() - and so much else besides! Thank you for explaining it so well. – armipunk Jul 25 '18 at 16:37
  • 1
    @armipunk Fyi, without that attribute, it may fail to be a properly functioning data.table. Eg, try `bind_rows(DT1,DT2)[, gah := 11]`, which on my system gives a warning about the missing attribute. Better not to do bind_rows unless it's supported through a data.table method in dtplyr, I guess https://github.com/hadley/dtplyr – Frank Jul 25 '18 at 16:52
  • 1
    @Frank Thank you - so far I have been pretty much sticking to data.table functions but as I start to mix things up a bit, good to know this. – armipunk Jul 25 '18 at 17:32