2

Problem: How can I fill backwards all rows in a group before an occurrence of a certain value. I am not trying to fill in NA or missing value using zoo na.locf. In the following I would like to fill all previous rows in A with 1.00 before the 1.00 occurs by each ID group, ideally using dplyr.

Input:

data<- data.frame(ID=c(1,1,1,1,2,2,2,3,3,3,4,4,4,4,4), 
              time=c(1,2,3,4,1,2,3,1,2,3,1,2,3,4,5),
              A=c(0.10,0.25,1,0,0.25,1,0.25,0,1,0.10,1,0.10,0.10,0.10,0.05))
ID time    A
1    1     0.10
1    2     0.25
1    3     1.00
1    4     0.00
2    1     0.25
2    2     1.00
2    3     0.25
3    1     0.00
3    2     1.00
3    3     0.10
4    1     1.00
4    2     0.10
4    3     0.10
4    4     0.10
4    5     0.05

Desired output:

ID time    A
1    1     1.00
1    2     1.00
1    3     1.00
1    4     0.00
2    1     1.00
2    2     1.00
2    3     0.25
3    1     1.00
3    2     1.00
3    3     0.10
4    1     1.00
4    2     0.10
4    3     0.10
4    4     0.10
4    5     0.05
Jaap
  • 81,064
  • 34
  • 182
  • 193
BEMR
  • 339
  • 1
  • 3
  • 14

2 Answers2

6

After grouping by ID you can check the cumulative sum of 1's and where it's still below 1 (not yet appeared), replace the A-value with 1:

data %>% 
  group_by(ID) %>% 
  mutate(A = replace(A, cumsum(A == 1) < 1, 1))
# Source: local data frame [15 x 3]
# Groups: ID [4]
# 
# ID  time     A
# <dbl> <dbl> <dbl>
# 1      1     1  1.00
# 2      1     2  1.00
# 3      1     3  1.00
# 4      1     4  0.00
# 5      2     1  1.00
# 6      2     2  1.00
# 7      2     3  0.25
# 8      3     1  1.00
# 9      3     2  1.00
# 10     3     3  0.10
# 11     4     1  1.00
# 12     4     2  0.10
# 13     4     3  0.10
# 14     4     4  0.10
# 15     4     5  0.05

Quite similar, you could also use cummax:

data %>% group_by(ID) %>% mutate(A = replace(A, !cummax(A == 1), 1))

And here's a base R approach:

transform(data, A = ave(A, ID, FUN = function(x) replace(x, !cummax(x == 1), 1)))
talat
  • 68,970
  • 21
  • 126
  • 157
3

We can use data.table. Convert the 'data.frame' to 'data.table' (setDT(data)), get the row where 'A' is 1, find the sequence of rows, use that as i to assign (:=) the values in 'A' to 1

library(data.table)
setDT(data)[data[, .I[seq_len(which(A==1))], ID]$V1, A := 1][]
#   ID time    A
# 1:  1    1 1.00
# 2:  1    2 1.00
# 3:  1    3 1.00
# 4:  1    4 0.00
# 5:  2    1 1.00
# 6:  2    2 1.00
# 7:  2    3 0.25
# 8:  3    1 1.00
# 9:  3    2 1.00
#10:  3    3 0.10
#11:  4    1 1.00
#12:  4    2 0.10
#13:  4    3 0.10
#14:  4    4 0.10
#15:  4    5 0.05

Or we can use ave from base R

data$A[with(data, ave(A==1, ID, FUN = cumsum)<1)] <- 1
akrun
  • 874,273
  • 37
  • 540
  • 662
  • @docendodiscimus Thanks man, you are right. I should have been more careful.. I think it would be `setDT(data)[data[, .I[cumsum(A==1) < 1], ID]$V1, A := 1]` – akrun Jun 01 '17 at 12:56