Efficient way of appending rows to data.table

Question

I do have a data.table, which needs to be able to expand, by appending rows. I've now been trying to combine basically the results of these two questions add-a-row-by-reference-at-the-end-of-a-data-table-object and why-is-rbindlist-better-than-rbind and came up with the following idea. I make use of rbindlist to expand the number of rows, but only when there are no empty rows left in the data.table. As long as empty rows are available I just use set() to fill in values.

dt <- structure(list(OilWellID = 1:5, Invest.Period = c(1L, 1L, 1L, 1L, 1L)
, Abandon.Period = c(1568L, 1833L, 2529L, 3384L, 1559L )
, Initial.Production = c(942.430661758408, 1354.41458085552,   674.456247827038, 770.618930924684, 922.09160188213)
, D = c(0.02, 0.02, 0.02, 0.02, 0.02))
, .Names = c("OilWellID", "Invest.Period", "Abandon.Period", "Initial.Production", "D"), class = c("data.table", "data.frame"), row.names = c(NA, -5L))

n.oil.well.portfolio <-  dt[,.N]

if(is.na(dt[.N, Invest.Period]) == FALSE){
    # If data.table is full increase size
    dt <- rbindlist( #Original data.table
            list( dt,
                    data.table( OilWellID = (n.oil.well.portfolio+1):(n.oil.well.portfolio*2)
                                , Invest.Period = rep(NA,n.oil.well.portfolio )
                                , Abandon.Period = rep(NA,n.oil.well.portfolio )
                                , Initial.Production = rep(NA,n.oil.well.portfolio )
                                , D = rep(NA,n.oil.well.portfolio )
                    )
            )
    )
 } else {
    # Do nothing - enough space to add additional rows with set()
 }
# Example of the set()- call
set(dt,as.integer(n.oil.well.portfolio+1), j = 2L, 4)

This combination seems to be one of the most efficient ways to do this, until it's possible to add rows by reference. A nice comparison on different ways of expanding a data.table can be found in this answer to another question.

I would like to keep track of a simulated portfolio and additional rows, i.e. assets can be added to the data.table. — hannes101, Oct 21 '16 at 12:18
I don't think it's a duplicate, since I am not only interested in the difference between rbindlist() and set() but rather would like to know if a combination of both is better. Especially the problem, that set() can't be used if there are no empty rows is addressed here. — hannes101, Oct 21 '16 at 12:27
I'm not convinced that you can't avoid this. I would look for alternatives. Maybe you can restructure so that you can at least append columns (thereby making use of over-allocation) instead of rows? — Roland, Oct 21 '16 at 12:34
If I append columns and thereby transpose the data, I can't calculate the necessary results, which are already implemented. Another more related question is https://stackoverflow.com/questions/10790204/how-to-delete-a-row-by-reference-in-data-table where Matt Dowle talks about an efficient insert() command. Although no idea if there exists something in the development tree. — hannes101, Oct 21 '16 at 13:03
It's also rather a duplicate of this http://stackoverflow.com/questions/20689650/how-to-append-rows-to-an-r-data-frame/38052208#38052208 — hannes101, Oct 21 '16 at 13:39
Yeah, that's a good duplicate, though I went with this one because the question posed is data.table specific. Re Matt Dowle's post, there's an open FR as well: https://github.com/Rdatatable/data.table/issues/635 I'm not sure why there isn't a similar FR for insertion... — Frank, Oct 21 '16 at 16:15
Should I delete the question or add some more information from the linked answer. — hannes101, Oct 25 '16 at 09:54

Efficient way of appending rows to data.table

0 Answers0

Linked