2

I am loading some data.tables and want to create some new columns in them.

There is a closely related question on this topic, but it is predicated on manually entering the name of each data.table. Here's my example:

library(data.table)
library(magrittr)

perf_attr = data.table(
    ID          = 1:2, 
    perf_date   = as.IDate("2015-12-18") + 0:1, 
    metro_pop   = 1e4*(1:2)
)

##### this part causes trouble ######
save(perf_attr, file = "tmp.rdata")
rm(perf_attr)
load("tmp.rdata")

add_vars = function(DT = data.table(), vars = list()){
    if (length(vars)) DT[, names(vars) := lapply(vars, . %>% `[[`(2) %>% eval)][]
    DT
} 

vars = list(
    perf_attr = list(  
        const           =   ~1,
        lpop            =   ~log(metro_pop),
        dum_weekend     =   ~weekdays(perf_date) %in% c("Friday", "Saturday")
    )
)

for (DTnm in names(vars)) add_vars(get(DTnm), vars[[DTnm]])

##### new columns should appear here, but don't ######
perf_attr 
#    ID  perf_date metro_pop medinc
# 1:  1 2015-12-18     10000  30000
# 2:  2 2015-12-19     20000  40000

Comments

  1. The get doesn't seem to be central to the problem, since add_vars(perf_attr, vars$perf_attr) also fails.

  2. If you skip the save/load part, it seems to work fine, with perf_attr modified by reference). It also works if I don't use a function, like:

    perf_attr[, names(vars$perf_attr) := lapply(vars$perf_attr, . %>% `[[`(2) %>% eval)]
    
  3. I'm used to having internal selfref pointers messed up for loaded data.tables, but am not sure how to repair them to make this work. I tried various lapply(mget(tables()$NAME), f) and for (DTnm in tables()$NAME){stuff} hacks after the load line, but to no good effect.

Community
  • 1
  • 1
Frank
  • 66,179
  • 8
  • 96
  • 180
  • Just re-stumbled on the linked question and am rereading it; I might delete this question if I find an answer that way. – Frank May 26 '16 at 19:14
  • Okay, problem solved-ish. Not sure if I should delete, dupe close or just leave it be to be modified when https://github.com/Rdatatable/data.table/issues/1017 is fixed up. – Frank May 26 '16 at 19:19

1 Answers1

3

Well, looking again at the linked answer, I came up with this loop to insert after load:

for (DTnm in tables()$NAME){
    assign(DTnm, alloc.col(get(DTnm)))
}

Of course, this tweaks all data.tables in memory.

Community
  • 1
  • 1
Frank
  • 66,179
  • 8
  • 96
  • 180