my [simplified] data looks like this:
id = sample(1:20, 5)
first_active = c(1,1,1,2,3)
week1 = c(1,1,1,0,0)
week2 = c(1,0,0,1,0)
week3 = c(1,0,1,0,1)
week4 = c(1,0,0,0,1)
week5 = c(0,0,0,0,1)
df = data.frame(cbind(id, first_active, week1, week2, week3, week4, week5))
I want to create a for loop that would:
i) in the same data.frame, create columns p1, p2,... corresponding to week1, week2,... columns and populate them with the following:
i) if the corresponding week value is not 0, then "active"`
ii) if value for a given week is 0, then check the previous p-columns status: if p[i-1] == "active" then "lapsed1"
iii) if value for a given week is 0, then check the previous p-columns status: if p[i-1] == "lapsed[j]" then "lapsed[j+1]"
iv) otherwise, return NA
this would be the solution to the above example (using mutate
in dplyr
):
df %>%
mutate( p1 = ifelse(week1 != 0, "active", NA),
p2 = ifelse(week2 !=0, "active",
ifelse(p1 == "active", "lapsed1", NA)),
p3 = ifelse(week3 !=0, "active",
ifelse(p2 == "lapsed1", "lapsed2",
ifelse(p2 == "active", "lapsed1", NA))),
p4 = ifelse(week4 !=0, "active",
ifelse(p3 == "lapsed2", "lapsed3",
ifelse(p3 == "lapsed1", "lapsed2",
ifelse(p3 == "active", "lapsed1", NA)))),
p5 = ifelse(week5 !=0, "active",
ifelse(p4 == "lapsed3", "lapsed4",
ifelse(p4 == "lapsed2", "lapsed3",
ifelse(p4 == "lapsed1", "lapsed2",
ifelse(p4 == "active", "lapsed1", NA)))))
)
id first_active week1 week2 week3 week4 week5 p1 p2 p3 p4 p5
9 1 1 1 1 1 0 active active active active lapsed1
5 1 1 0 0 0 0 active lapsed1 lapsed2 lapsed3 lapsed4
14 1 1 0 1 0 0 active lapsed1 active lapsed1 lapsed2
3 2 0 1 0 0 0 <NA> active lapsed1 lapsed2 lapsed3
8 3 0 0 1 1 1 <NA> <NA> active active active
I want to create a function/for loop that would do it automatically, as my original data has tens of 'week' columns to refer to.
What I managed to get so far is:
df$p1 = ifelse(df$week1 > 0, "active", NA) # initiating the first p-column
for(i in 2:(ncol(df)-2)) { # defining dynamically number of periods
column_to_write = paste0("p", i, sep="") # column to be populated
prev_column = paste0("p", i-1, sep="") #previous p-column to the one that's being populated
orig_column = paste0("week", i, sep="") #reference 'week' column
j = 1 #initiating 'lapsed' number
df[column_to_write] = ifelse(df[orig_column]> 0, "active",
ifelse(df[prev_column] == "active", paste("lapsed", j, sep=""),
ifelse(df[prev_column] == paste0("lapsed", j, sep=""), paste0("lapsed", j=j+1, sep=""), NA)))
}
but this only gives me max values of "lapsed2"
and creates new columns called week[i]
rather than p[i]
.
id first_active week1 week2 week3 week4 week5 p1 week2 week3 week4 week5
9 1 1 1 1 1 0 active active active active lapsed1
5 1 1 0 0 0 0 active lapsed1 lapsed2 <NA> <NA>
14 1 1 0 1 0 0 active lapsed1 active lapsed1 lapsed2
3 2 0 1 0 0 0 <NA> active lapsed1 lapsed2 <NA>
8 3 0 0 1 1 1 <NA> <NA> active active active
How do I change the code so that numbers in "lapsed"
values continue to rise beyond 2?
Thanks for your help! Kasia