Suppose I have a data frame that looks like this:
ID T X Y Z
1 1 A A NA
1 2 B A NA
1 3 B B NA
1 4 B A NA
2 1 A B NA
2 2 A A NA
2 3 B A NA
2 4 A B NA
3 1 B B NA
3 2 B B NA
3 3 B B NA
3 4 B A NA
And I would like to replace the value of Z based on some conditionals that depend on both row and (previous) column values so that the above ends up looking like this:
ID T X Y Z
1 1 A A 0
1 2 B A 0
1 3 B B 1
1 4 B A NA
2 1 A B 0
2 2 A A 0
2 3 B A 0
2 4 A B 0
3 1 B B 1
3 2 B B NA
3 3 B B NA
3 4 B A NA
The rules:
- Z takes the value of 1 the first time (in order by T, and within an ID) that both X and Y one that row have the value B.
- Z takes (or retains) the value NA if and only if for any smaller value of T, it has taken the value of 1 already.
- When T = 1, Z takes the value of 0 if X and Y on that row do not both equal B.
- When T > 1, Z takes the value of 0 if X and Y on that row do not both equal B, AND the value of Z on the previous row = zero.
I want the following to work, and it gets me kinda close but no dice:
df$Z <- NA
for (t in 1:4) {
df$Z[ (df$X=="B" & df$Y=="B") & df$T==1] <- 1
df$Z[!(df$X=="B" & df$Y=="B") & df$T==1] <- 0
if (t>1) {
df$Z[ (df$X=="B" & df$Y=="B") & df$T==t & (!is.na(df$Z[t-1]) & df$Z[t-1]==0)] <- 0
df$Z[!(df$X=="B" & df$Y=="B") & df$T==t & (!is.na(df$Z[t-1]) & df$Z[t-1]==0)] <- 1
}
}
On the other hand, I can write series of nested if... then
statements looping across all observations, but that is excruciatingly slow (at least, compared to the program I am translating from on Stata).
I am sure I am committing twelve kinds of gaffes in my attempt above, but a few hours of banging my head on this has not resolved it.
So I come to you begging, hat in hand. :)
Edit: it occurs to me that sharing the Stata code (which resolves this so much faster than what I have come up with in R, which is ironic, given my preference for R over Stata's language :) might help with suggestions. This does what I want, and does it fast (even with, say, N=1600, T=11):
replace Z = .
forvalues t = 1(1)4 {
replace Z = 1 if X == "B" & Y == "B" & T == 1
replace Z = 0 if X == "B" & Y == "B" & T == 1
replace Z = 1 if X == "B" & Y == "B" & T == `t' & Z[_n-1] == 0 & `t' > 1
replace Z = 0 if X == "B" & Y == "B" & T == `t' & Z[_n-1] == 0 & `t' > 1
}