This isn't a dupe of this. That question deals with rows which already have NAs in them, my question deals with missing rows for which there should be a data point of 0.
Let's say I have this data.table
dt<-data.table(id=c(1,2,4,5,6,1,3,4,5,6),
varname=c(rep('banana',5),rep('apple',5)),
thedata=runif(10,1,10))
What's the best way to add, for each varname
, the missing id
s with a 0 for thedata
?
At the moment I dcast with fill=0
and then melt again but this doesn't seem very efficient.
melt(dcast.data.table(dt,id~varname,value.var='thedata',fill=0),id.var='id',variable.factor=FALSE,variable.name='varname',value.name='thedata')
I also just thought of doing it this way but it gets a little clunky to fill in NAs at the end
merge(dt[,CJ(id=unique(id),varname=unique(varname))],dt,by=c('varname','id'),all=TRUE)[,.(varname,id,thedata=ifelse(!is.na(thedata),thedata,0))]
In this example, I only used one id
column but any additional suggestion should be extensible to having more than one id
column.
EDIT
I did a system.time
on each approach with a largish data set and the melt/cast approach took between 2-3 seconds while the merge/CJ approach took between 12-13.
EDIT2
Roland's CJ approach is much better than mine as it only took between 4-5 seconds with my dataset.
Is there a better way to do this?