I'm trying to get an idea of the availability of my data which might look like:
DT <- data.table(id=rep(c("a","b"),each=20),time=rep(1991:2010,2),
x=rbeta(40,shape1=1,shape2=2),
y=rnorm(40))
#I have some NA's (no gaps):
DT[id=="a"&time<2000,x:=NA]
DT[id=="b"&time>2005,y:=NA]
but is much larger of course. Ideally, I'd like to see a table like this:
a b
x 2000-2010 1991-2010
y 1991-2010 1991-2005
so the non-missing minimum to the non-missing maximun time period. I can get that for one variable:
DT[,availability_x:=paste0(
as.character(min(ifelse(!is.na(x),time,NA),na.rm=T)),
"-",
as.character(max(ifelse(!is.na(x),time,NA),na.rm=T))),
by=id]
But in reality, I want to do that for many variables. All my attempts to do that fail, however, because I'm having a hard time communicating a vector of columns to the data table. My guess is that it goes in the direction of this or this but my attempts to adapt these solutions to a vector of columns failed.
An apply function for example doesn't seem to evaluate the elements of a character vector:
cols <- c("x","y")
availabilityfunction <- function(i){
DT[,paste0("avail_",i):=paste0(
as.character(min(ifelse(!is.na(i),time,NA),na.rm=T)),
"-",
as.character(max(ifelse(!is.na(i),time,NA),na.rm=T))),
by=id]}
lapply(cols,availabilityfunction)