How to use apply function over character vectors inside data.table

Question

I'm trying to get an idea of the availability of my data which might look like:

DT <- data.table(id=rep(c("a","b"),each=20),time=rep(1991:2010,2),
             x=rbeta(40,shape1=1,shape2=2),
             y=rnorm(40))
#I have some NA's (no gaps):
DT[id=="a"&time<2000,x:=NA]
DT[id=="b"&time>2005,y:=NA]

but is much larger of course. Ideally, I'd like to see a table like this:

       a         b
x 2000-2010  1991-2010
y 1991-2010  1991-2005

so the non-missing minimum to the non-missing maximun time period. I can get that for one variable:

DT[,availability_x:=paste0(
        as.character(min(ifelse(!is.na(x),time,NA),na.rm=T)),
        "-",
        as.character(max(ifelse(!is.na(x),time,NA),na.rm=T))),
    by=id]

But in reality, I want to do that for many variables. All my attempts to do that fail, however, because I'm having a hard time communicating a vector of columns to the data table. My guess is that it goes in the direction of this or this but my attempts to adapt these solutions to a vector of columns failed.

An apply function for example doesn't seem to evaluate the elements of a character vector:

cols <- c("x","y")

availabilityfunction <- function(i){
  DT[,paste0("avail_",i):=paste0(
       as.character(min(ifelse(!is.na(i),time,NA),na.rm=T)),
       "-",
       as.character(max(ifelse(!is.na(i),time,NA),na.rm=T))),
    by=id]}
lapply(cols,availabilityfunction)

An alternative: `melt(DT, id = c("id", "time"))[!is.na(value), as.list(summary(time)), by=.(id, variable)]` or `range` in place of `summary`. — Frank, Apr 18 '17 at 16:21

score 3 · Accepted Answer · answered Apr 18 '17 at 16:06

3

We can loop (lapply) through the columns of interest specified in .SDcols after grouping by 'id', create a logical index of non-NA elements (!is.na), find the numeric index (which), get the range (i.e. min and max), use that to subset the 'time' column and paste the time elements together.

DT[, lapply(.SD, function(x) paste(time[range(which(!is.na(x)))], 
                      collapse="-")), by = id, .SDcols = x:y]
#   id         x         y
#1:  a 2000-2010 1991-2010
#2:  b 1991-2010 1991-2005

answered Apr 18 '17 at 16:06

akrun

874,273
37
540
662

that looks really good. But I get the following error: `Error in time[range(which(!is.na(x)))] : object of type 'closure' is not subsettable` which I usually get when I look for something that doesn't exist. Maybe its the double naming of x in the function and the data.table? – Jakob Apr 18 '17 at 16:53
@PeterPan Is this error based on the example you showed? If that is the case, it is working fine for me. I am using `data.table_1.10.4` – akrun Apr 18 '17 at 16:58
Probably you don't have a column named `time` where you're testing the code, so it's finding the function of that name instead @PeterPan – Frank Apr 18 '17 at 17:01
@akrun yes but with `data.table 1.9.6` and an update error, hmphf... I'll get back to you when I fixed this. Thanks so far! – Jakob Apr 18 '17 at 17:02
@Frank: thanks for sticking around! I change time to year without an effect. Also change the variable name (in case it gets confused with function(x) because the variable is called x) – Jakob Apr 18 '17 at 17:05
@PeterPan That could explain the error as we are also using a variable 'time' that is not included in `.SDcols` (which I think is a recent update in one of the versions) – akrun Apr 18 '17 at 17:09
2

@akrun Yes! Thats it. adding time to an object like `cols <- c("x","y","time")` and changing your last argument to `.SDcols = cols` solves the problem. Thanks! – Jakob Apr 18 '17 at 17:14

How to use apply function over character vectors inside data.table

1 Answers1