Part of a funtion I am including in an R-package involves filling NAs
with last ovbservation carried forward (locf
). The locf should be implemnted to all columns in the data frame except what I called below the good columns goodcols
(i.e. should be applied to the badcols
). The column names for the badcols
can be anything. I use the locf
function below and a for-loop to acheive this. However, the for-loop is a bit slow when using large data set. Can anybody suggest a faster alternative or another way of filling in the NAs in the presented scenario?
Here is an example data frame:
#Test df
TIME <- c(0,5,10,15,20,25,30,40,50)
AMT <- c(50,0,0,0,50,0,0,0,0)
COV1 <- c(10,9,NA,NA,5,5,NA,10,NA)
COV2 <- c(20,15,15,NA,NA,10,NA,30,NA)
ID <- rep(1, times=length(TIME))
df <- data.frame(ID,TIME,AMT,COV1,COV2)
df <- expand.grid(df)
goodcols <- c("ID","TIME","AMT")
badcols <- which(names(df)%in%goodcols==F)
#----------------------------------------------------
#locf function
locf <- function (x) {
good <- !is.na(x)
positions <- seq(length(x))
good.positions <- good * positions
last.good.position <- cummax(good.positions)
last.good.position[last.good.position == 0] <- NA
x[last.good.position]
}
#------------------------------------------------------
#Now fill in the gaps by locf function
for (i in badcols)
{
df[,i] <- locf(df[,i])
}