We can determine the indices idx
first using which
and then replace only these indices with [idx-1]
.
The function ByWhich
shows how it works.
# Sample data
data_final <- data.frame(ID=c(rep("01",12),rep("02",12)), t = rep(1:12,2), x= c(rep("A",3), "B", NA, rep("A",3),rep("C",4),rep("A",5),rep("C",3),NA,"C",rep("A",2)))
# New solution
ByWhich <- function(x) {
idx <- which(is.na(x))
x[idx] <- x[idx-1]
return(x)
}
# Solution by asker
ByLoop <- function(x) {
for (i in 2:length(x)) {
x[i] <- ifelse(is.na(x[i]), x[i-1], x[i])
}
return(x)
}
# Test if the functions provide equal solutions
all(ByLoop(data_final$x) == ByWhich(data_final$x))
#> [1] TRUE
The benchmark shows that the solution using which
is faster by about 40%.
library(microbenchmark)
microbenchmark::microbenchmark(
ByWhich = ByWhich(data_final$x),
ByLoop = ByLoop(data_final$x)
)
#> Unit: microseconds
#> expr min lq mean median uq max neval
#> ByWhich 2.001 2.1010 23.60294 2.4010 2.5010 2124.802 100
#> ByLoop 35.400 36.2515 37.16908 37.0005 37.5015 42.301 100
This solution does not require an extra package. However, the zoo or tidyverse solutions provided in the comments are probably even faster.
Created on 2021-05-21 by the reprex package (v2.0.0)