5

Referring to the question Crashing R when calling `write.table` on particular data set, I can almost "reliably" crash 64-bit R --vanilla on Windows-64bit by saving a large data.table in one session. When I say almost, once it happened (when demonstrating the crash to a guy in IT!) that I got the message

Error in .External2(C_writetable, x, file, nrow(x), p, rnames, sep, eol,  :
  'getCharCE' must be called on a CHARSXP

referenced in the above question.

To crash R I just need to save(DT, "datatablefile.RData")

and then in another R session which could be --vanilla, I just say...

load("datatablefile.RData")
write.csv(DT, file='datatablefile.csv')

which will then crash after a minute or two. Note in particular that it will NOT crash if I say

load("datatablefile.RData")
library(data.table)
write.csv(DT, file='datatablefile.csv')

When I say something like

library(data.table)
N <- 1000
DT <- data.table(id=1:N, name=sample(letters, N, replace=TRUE))
save(DT, file='dttest.RData')

and then in another session

load('dttest.RData')
write.csv(DT, 'dttest.csv')

I don't get a crash...

There was the suggestion it might be linked to rbindlist(), so

library(data.table)
N <- 10000000
DT1 <- data.table(id=1:N, name=sample(letters, N, replace=TRUE))
DT2 <- data.table(id=1:N, name=sample(letters, N, replace=TRUE))
DT <- rbindlist(list(DT1, DT2))
save(DT, file='dttest.RData')

Note that I have tried this for N <- 10000000, on this 32gb machine and it still works fine...

It has been suggested it might be due to factors?

library(data.table)
N <- 1000
DT1 <- data.table(id=1:N, name=sample(letters, N, replace=TRUE),
            code=as.factor(sample(letters[1:5], N, replace=TRUE)))
DT2 <- data.table(id=1:N, name=sample(letters, N, replace=TRUE),
            code=as.factor(sample(letters[1:5], N, replace=TRUE)))
DT <- rbindlist(list(DT1, DT2))
save(DT, file='dttest.RData')

str(DT)
Classes ‘data.table’ and 'data.frame':  20000000 obs. of  3 variables:
 $ id  : int  1 2 3 4 5 6 7 8 9 10 ...
 $ name: chr  "v" "u" "t" "z" ...
 $ code: Factor w/ 5 levels "a","b","c","d",..: 2 5 4 2 2 1 2 3 2 4 ...
 - attr(*, ".internal.selfref")=<externalptr> 

Then in the other session

> load('dttest.RData')
> tables()
Error: could not find function "tables"
> str(DT)
Classes ‘data.table’ and 'data.frame':  20000000 obs. of  3 variables:
 $ id  : int  1 2 3 4 5 6 7 8 9 10 ...
 $ name: chr  "v" "u" "t" "z" ...
 $ code: Factor w/ 5 levels "a","b","c","d",..: 2 5 4 2 2 1 2 3 2 4 ...
 - attr(*, ".internal.selfref")=<externalptr> 
 > write.csv(DT, 'dttest.csv')

which then works fine...

It seems fine when I write a large data.table which can contain chr, num, Date but seeems to fail when it contains Factors...

Any suggestions as to how I might figure out how to create a reliable demonstration of how to do replicate this? The contents of the tables themselves are highly confidential.

Update I've just tried doing

   setkey(DT,id)

but it didn't cause a crash.

Community
  • 1
  • 1
Sean
  • 3,765
  • 3
  • 26
  • 48
  • Yes it could well be linked to `rbindlist` and factors, if you are using `rbindlist` to create the large `data.table`. This has been fixed in v1.8.9 on R-Forge, please could you upgrade and try again. – Matt Dowle Jun 09 '13 at 13:14
  • @MatthewDowle Sorry Matthew, but in R-3.0.0 or R-3.0.1 I am unfortunately not able to replicate this! I'm afraid I don't have a copy of R-2.15.3 anymore - this is a restricted environment. – Sean Jul 01 '13 at 08:40
  • @Sean consider to close the question if you are not able to reproduce it anymore. – jangorecki Jan 16 '15 at 19:36

0 Answers0