Create new variable based on prior observation value from another column

Question

I am constructing a new variable that’s value is contingent on the prior row in another column. Therefore, the order of the data is important. This is how my data currently looks

ID  Cong  Comm    Y
 1   52    3      0
 1   53    3      0
 1   54    3      1
 1   53    4      1
 2   50    2      1
 2   50    7      1
 3   48    4      1
 4   48    3      1
 4   48    7      0
 4   49    7      1

I would like to create a new variable called Y2. If the observation’s Y=0, then Y2 in the same observation should equal 1. If the following row’s has Y=0, then add 1 to the previous Y2 value (the Y2 value for this observation should equal 2). Continue this process until Y=1, add 1, and then stop the process. Essentially, the new variable counts up until the other column's value equals "1" and repeats the process.

This is what it should look like:

ID  Cong  Comm    Y   Y2
 1   52    3      0   1
 1   53    3      0   2
 1   54    3      1   3 
 1   53    4      1   1
 2   50    2      1   1
 2   50    7      1   1
 3   48    4      1   1
 4   48    3      1   1
 4   48    7      0   1 
 4   49    7      1   2

Here is my sample dataframe.

data.frame(
ID = c(1L, 1L, 1L, 1L, 2L, 2L, 3L, 4L, 4L, 4L), 
Cong = c(52L, 53L, 54L, 53L, 50L, 50L, 48L, 48L, 48L, 49L), 
Comm = c(3L, 3L, 3L, 4L, 2L, 7L, 4L, 3L, 7L, 7L),
Y=c(0L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 1L))

Would a loop or if-else command be the best way to tackle this? I tried an if-else statement, but my code did not work. Any recommendations would be great.

score 0 · Accepted Answer · edited May 23 '17 at 10:29

You can do this like this, supposing your data.frame is df:

y = df$Y
bool=y==c(0, head(y, -1))
y[which(bool %in% F)] = 0

df$Y2 = ifelse(y==0, f7(!y), 1)

#   ID Cong Comm Y Y2
#1   1   52    3 0  1
#2   1   53    3 0  2
#3   1   54    3 1  3
#4   1   53    4 1  1
#5   2   50    2 1  1
#6   2   50    7 1  1
#7   3   48    4 1  1
#8   4   48    3 1  1
#9   4   48    7 0  1
#10  4   49    7 1  2

The trick is done with:

f7 <- function(x){ tmp<-cumsum(x);tmp-cummax((!x)*tmp)}

Entirely defined in this great post:

count how many consecutive values are true

Finally this solution is entirely vectorized, no loop.

Create new variable based on prior observation value from another column

1 Answers1