3

I have a script that creates a data table by inserting data row by row using a loop. The insertion is done using rbindlist(). This method seems to be vary consuming, as it seems that in every iteration my data table dt is reallocated:

tracemem(dt)
[1] "<0x2bd3d00>"
tracemem(dt <- rbindlist(list(dt, newrow)))
[1] "<0x44a7fe0>"

Some old comments (~3 years) in this question mention the planning of an insert() method, however I have not found any update in this regard. Is there any memory efficient method to do this?

Community
  • 1
  • 1
  • You can follow such thing's at the project's issue tracker. For insert https://github.com/Rdatatable/data.table/issues/660 and for deletion https://github.com/Rdatatable/data.table/issues/635 – Frank Jun 02 '16 at 12:21

1 Answers1

4

You are growing an object in a loop. Of course, this is slow, it doesn't matter that it is a data.table.

One of the secrets why data.table is so efficient is that it over-allocates, i.e., reserves memory for columns that don't exist during its creation. You need to do something like that for rows. Create the whole number of empty rows you are going to need in your loop, bind them to the data.table at once and fill them by assignment in the loop, preferably using set.

Roland
  • 127,288
  • 10
  • 191
  • 288