Referring to the question Crashing R when calling `write.table` on particular data set, I can almost "reliably" crash 64-bit R --vanilla
on Windows-64bit by saving a large data.table
in one session. When I say almost, once it happened (when demonstrating the crash to a guy in IT!) that I got the message
Error in .External2(C_writetable, x, file, nrow(x), p, rnames, sep, eol, :
'getCharCE' must be called on a CHARSXP
referenced in the above question.
To crash R I just need to save(DT, "datatablefile.RData")
and then in another R session which could be --vanilla
, I just say...
load("datatablefile.RData")
write.csv(DT, file='datatablefile.csv')
which will then crash after a minute or two. Note in particular that it will NOT crash if I say
load("datatablefile.RData")
library(data.table)
write.csv(DT, file='datatablefile.csv')
When I say something like
library(data.table)
N <- 1000
DT <- data.table(id=1:N, name=sample(letters, N, replace=TRUE))
save(DT, file='dttest.RData')
and then in another session
load('dttest.RData')
write.csv(DT, 'dttest.csv')
I don't get a crash...
There was the suggestion it might be linked to rbindlist()
, so
library(data.table)
N <- 10000000
DT1 <- data.table(id=1:N, name=sample(letters, N, replace=TRUE))
DT2 <- data.table(id=1:N, name=sample(letters, N, replace=TRUE))
DT <- rbindlist(list(DT1, DT2))
save(DT, file='dttest.RData')
Note that I have tried this for N <- 10000000
, on this 32gb machine and it still works fine...
It has been suggested it might be due to factors?
library(data.table)
N <- 1000
DT1 <- data.table(id=1:N, name=sample(letters, N, replace=TRUE),
code=as.factor(sample(letters[1:5], N, replace=TRUE)))
DT2 <- data.table(id=1:N, name=sample(letters, N, replace=TRUE),
code=as.factor(sample(letters[1:5], N, replace=TRUE)))
DT <- rbindlist(list(DT1, DT2))
save(DT, file='dttest.RData')
str(DT)
Classes ‘data.table’ and 'data.frame': 20000000 obs. of 3 variables:
$ id : int 1 2 3 4 5 6 7 8 9 10 ...
$ name: chr "v" "u" "t" "z" ...
$ code: Factor w/ 5 levels "a","b","c","d",..: 2 5 4 2 2 1 2 3 2 4 ...
- attr(*, ".internal.selfref")=<externalptr>
Then in the other session
> load('dttest.RData')
> tables()
Error: could not find function "tables"
> str(DT)
Classes ‘data.table’ and 'data.frame': 20000000 obs. of 3 variables:
$ id : int 1 2 3 4 5 6 7 8 9 10 ...
$ name: chr "v" "u" "t" "z" ...
$ code: Factor w/ 5 levels "a","b","c","d",..: 2 5 4 2 2 1 2 3 2 4 ...
- attr(*, ".internal.selfref")=<externalptr>
> write.csv(DT, 'dttest.csv')
which then works fine...
It seems fine when I write a large data.table
which can contain chr, num, Date but seeems to fail when it contains Factors...
Any suggestions as to how I might figure out how to create a reliable demonstration of how to do replicate this? The contents of the tables themselves are highly confidential.
Update I've just tried doing
setkey(DT,id)
but it didn't cause a crash.