Background
I tried to replace some CSV
output files with rds
files to improve efficiency. These are intermediate files that will serve as inputs to other R scripts.
Question
I started investigating when my scripts failed and found that readRDS()
and load()
do not return identical data tables
as the original. Is this supposed to happen? Or did I miss something?
Sample code
library( data.table )
aDT <- data.table( a=1:10, b=LETTERS[1:10] )
saveRDS( aDT, file = "aDT.rds")
bDT <- readRDS( file = "aDT.rds" )
identical( aDT, bDT, ignore.environment = T ) # Gives 'False'
aDF <- data.frame( a=1:10, b=LETTERS[1:10] )
saveRDS( aDF, file = "aDF.rds")
bDF <- readRDS( file = "aDF.rds" )
identical( aDF, bDF, ignore.environment = T ) # Gives 'True'
# Using 'save'& 'load' doesn't help either
aDT2 <- data.table( a=1:10, b=LETTERS[1:10] )
save( aDT2, file = "aDT2.RData")
bDT2 <- aDT2; rm( aDT2 )
load( file = "aDT2.RData" )
identical( aDT2, bDT2, ignore.environment = T ) # Gives 'False'
I am running R ver 3.2.0 on Linux Mint and have tested with data.table
ver 1.9.4 and 1.9.5 (latest).
Searching in SO and google returned this and this but I don't think they answer this issue. I am still trying to figure out why my scripts failed when I switched to rds
but I am starting with this.
Would appreciate it very much if knowledgeable SO members can help. Thanks!
Edit:
Hi everyone, I happened to find a way to resolve the issue - have posted the solution below. I apologise if it's rather inelegant. Now, I have 2 further questions:
(1) Is there a better way?
(2) Can something be done at the R
and/or data.table
code to resolve this? I mean, this issue causes unpredictable bugs and is not the first thing that comes to mind. My 2 cents worth.