5

I know that when I'm building up a data.table row-by-row, it's best to pre-allocate space:

library(data.table)
dt <- data.table(x=rep(0,1000), y=rep(0,1000))
for(i in 1L:1000L) {
    set(dt, i, 1L, runif(1))
    set(dt, i, 2L, rnorm(1))
}

(In fact, if I don't pre-allocate, I get a segmentation fault with that code.)

If I don't know the number of rows in advance, then I need to grow dynamically, probably using exponential allocation or something. Will I need to manage that process myself, or is there any existing support in data.table for dynamic growth?

Also, when I'm done appending rows, I'll probably have some allocated space left over, is there a truncate() method or similar? Or should I just do dt <- dt[1:n,]?

Ken Williams
  • 22,756
  • 10
  • 85
  • 147
  • Regarding the segfault, I was going to post it as a bug report, but I can't always seem to duplicate it. It might have something to do with RStudio & the way it refreshes the session when building packages. – Ken Williams Jul 10 '13 at 17:17
  • 1
    How about making a temp data.table and then `rbindlist` the "master" with the temp? – Dean MacGregor Jul 10 '13 at 17:22
  • 1
    I think that's basically the same as simply `rbind`-ing a new extension on the bottom each time I need to grow, right? – Ken Williams Jul 10 '13 at 20:29
  • I think the suggestion is not to `rbind` to a `data.table`, but put the results in a list and then `rbindlist` everything together. That will avoid copying on every step. – eddi Jul 10 '13 at 22:05
  • There are plans to have "insert rows by reference" functionality - see http://stackoverflow.com/questions/10790204/how-to-delete-a-row-by-reference-in-r-data-table – mnel Jul 11 '13 at 00:07

0 Answers0