R newbie (ish). I've written some code which uses a for()
loop in R. I want to rewrite it in a vectorised form, but it's not working.
Simplified example to illustrate:
library(dplyr)
x <- data.frame(name = c("John", "John", "John", "John", "John", "John", "John", "John", "Fred", "Fred"),
year = c(1, NA, 2, 3, NA, NA, 4, NA, 1, NA))
## if year is blank and name is same as name from previous row
## take year from previous row
## else
## stick with the year you already have
# 1. Run as a loop
x$year_2 <- NA
x$year_2[1] <- x$year[1]
for(row_idx in 2:10)
{
if(is.na(x$year[row_idx]) & (x$name[row_idx] == x$name[row_idx - 1]))
{
x$year_2[row_idx] = x$year_2[row_idx - 1]
}
else
{
x$year_2[row_idx] = x$year[row_idx]
}
}
# 2. Attempt to vectorise
x <- data.frame(name = c("John", "John", "John", "John", "John", "John", "John", "John", "Fred", "Fred"),
year = c(1, NA, 2, 3, NA, NA, 4, NA, 1, NA))
x$year_2 <- ifelse(is.na(x$year) & x$name == lead(x$name),
lead(x$year_2),
x$year)
I think the vectorised version is being messed up because there's a circularity to it (ie x$year_2
appears on both sides of the <-
). Is there a way to get around this?
Thank you.