58

If I have a data frame

set.seed(12345) 
df=data.frame(a=rnorm(5),b=rnorm(5))

I can add a row by e.g.

df[6,] =c(5,6)

If I now do the equivalent in data.table

library(data.table)
dt=data.table(df)
dt[6,]=c(5,6)

It fails with an error. What is the right way to insert a row into a data.table?

Alex Brown
  • 41,819
  • 10
  • 94
  • 108
Tahnoon Pasha
  • 5,848
  • 14
  • 49
  • 75
  • 2
    I think an `insert()` function is planned for this package to make it relatively fast to add rows, but as of now, you have to preallocate the `nrow` of the data table. Maybe this: http://r-forge.r-project.org/tracker/index.php?func=detail&aid=1458&group_id=240&atid=978 – Frank May 20 '13 at 15:11
  • 13
    Is `rbind(dt,list(5,6))` sufficient for you purpose? – Roland May 20 '13 at 15:25
  • 2
    btw ime every time I thought I needed to add data row by row, I was thinking C-style and not R-style - so aside from the above comments you should reconsider whether or not you actually need to do this – eddi May 20 '13 at 15:35
  • 1
    I think this is almost the same question...? http://stackoverflow.com/questions/16792001/add-a-row-by-reference-at-the-end-of-a-data-table-object – Frank May 30 '13 at 17:20
  • 1
    @Roland: most of the reason for using data.table is memory efficiency, due to not copying tables. Rbind *does* create copies, and can become a huge memory hog with big data... – naught101 Feb 03 '15 at 23:32
  • 1
    @naught101 Please note that my comment is almost two years old. data.table has been improved in that time as has my understanding of it. – Roland Feb 04 '15 at 08:03

1 Answers1

61

To expand on @Franks answer, if in your particular case you are appending a row, it's :

set.seed(12345) 
dt1 <- data.table(a=rnorm(5), b=rnorm(5))

The following are equivalent; I find the first easier to read but the second faster:

microbenchmark(
  rbind(dt1, list(5, 6)),
  rbindlist(list(dt1, list(5, 6)))        
  )

As we can see:

                             expr     min      lq  median       uq     max
           rbind(dt1, list(5, 6)) 160.516 166.058 175.089 185.1470 457.735
 rbindlist(list(dt1, list(5, 6))) 130.137 134.037 140.605 149.6365 184.326

If you want to insert the row elsewhere, the following will work, but it's not pretty:

rbindlist(list(dt1[1:3, ], list(5, 6), dt1[4:5, ]))

or even

rbindlist(list(dt1[1:3, ], as.list(c(5, 6)), dt1[4:5, ]))

giving:

            a          b
1:  0.5855288 -1.8179560
2:  0.7094660  0.6300986
3: -0.1093033 -0.2761841
4:  5.0000000  6.0000000
5: -0.4534972 -0.2841597
6:  0.6058875 -0.9193220

If you are modifying a row in place (which is the preferred approach), you will need to define the size of the data.table in advance i.e.

dt1 <- data.table(a=rnorm(6), b=rnorm(6))
set(dt1, i=6L, j="a", value=5) # refer to column by name
set(dt1, i=6L, j=2L, value=6) # refer to column by number

Thanks @Boxuan, I have modified this answer to take account of your suggestion, which is a little faster and easier to read.

dardisco
  • 5,086
  • 2
  • 39
  • 54
  • 1
    why don't use `rbindlist(list(dt1, list(5,6)))` as your second option for slightly better readability? – Boxuan Aug 03 '15 at 15:03
  • 1
    @Boxuan `rbindlist(list(dt1, list(c(5, 6))))` gives me `Error ... Item 2 has 1 columns, inconsistent with item 1 which has 2 columns...` See output from `list(c(5, 6))`, which is a list with one element vs. `as.list(c(5, 6))` which has two. – dardisco Aug 04 '15 at 02:06
  • That's strange. I am able to run it on my machine. Maybe we have different versions? `packageVersion("data.table")` gives me `[1] '1.9.4'` and `getRversion()` gives me `[1] '3.2.0'`. – Boxuan Aug 04 '15 at 15:32
  • 1
    maybe this is the culprit: I am suggesting `list(5,6)` instead of `list(c(5, 6))`. =) – Boxuan Aug 04 '15 at 15:34
  • Yes; sorry, I missed that the first time. I agree with your helpful comments. – dardisco Aug 04 '15 at 19:37