0

I have the following:

df <- data.frame(A = c(1:8), ref.date = c(NA, "10/12/18", NA, NA, "12/15/19", NA, NA, NA))
df$ref.date <- as.Date(df$ref.date, format = "%m/%d/%y")
df$new.date <- NA

I would like to update new.date such that for any given row, new.date is equal to ref.date if ref.date is not NA, and is equal to the value of new.date in the previous row if ref.date is NA. So the result would be:

A  ref.date new.date
1     <NA>       NA
2 10/12/18       10/12/18
3     <NA>       10/12/18
4     <NA>       10/12/18
5 12/15/19       12/15/19
6     <NA>       12/15/19
7     <NA>       12/15/19
8     <NA>       12/15/19

I tried

library(dplyr)
df <- df %>% mutate(new.date = ifelse(is.na(ref.date), lag(new.date), ref.date))
df$new.date <- as.Date(df$new.date, format = "%m/%d/%y")

But this yielded dates in numeric format and did not fill rows correctly where ref.date is NA.

marcel
  • 389
  • 1
  • 8
  • 21

3 Answers3

3

I think this should do it:

df <- data.frame(A = c(1:8), ref.date = c(NA, "10/12/18", NA, NA, "12/15/19", NA, NA, NA))
df$ref.date <- as.Date(df$ref.date, format = "%m/%d/%y")
df$new.date <- NA

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(tidyr)

df %>%
  mutate(new.date = ref.date) %>% 
  fill(`new.date`, .direction = "down")
#>   A   ref.date   new.date
#> 1 1       <NA>       <NA>
#> 2 2 2018-10-12 2018-10-12
#> 3 3       <NA> 2018-10-12
#> 4 4       <NA> 2018-10-12
#> 5 5 2019-12-15 2019-12-15
#> 6 6       <NA> 2019-12-15
#> 7 7       <NA> 2019-12-15
#> 8 8       <NA> 2019-12-15
cimentadaj
  • 1,414
  • 10
  • 23
0

We can copy ref.date into new.date column and then use fill from tidyr

library(dplyr)
df %>% mutate(new.date = ref.date) %>% tidyr::fill(new.date)

#  A   ref.date   new.date
#1 1       <NA>       <NA>
#2 2 2018-10-12 2018-10-12
#3 3       <NA> 2018-10-12
#4 4       <NA> 2018-10-12
#5 5 2019-12-15 2019-12-15
#6 6       <NA> 2019-12-15
#7 7       <NA> 2019-12-15
#8 8       <NA> 2019-12-15
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
0

Here are some base R solutions.

  • using rle() + cumsum():
df$new.date <- with(rle(cumsum(!is.na(df$ref.date))),
                    rep(df$ref.date[c(0,cumsum(lengths[-length(lengths)]))+1],lengths))
  • using split() + rbind():
df <- do.call(rbind,
              c(make.row.names = F,
                lapply(split(df,cumsum(!is.na(df$ref.date))), 
                       function(v) cbind(v,new.date = head(v$ref.date,1)))))

such that

> df
  A   ref.date   new.date
1 1       <NA>       <NA>
2 2 2018-10-12 2018-10-12
3 3       <NA> 2018-10-12
4 4       <NA> 2018-10-12
5 5 2019-12-15 2019-12-15
6 6       <NA> 2019-12-15
7 7       <NA> 2019-12-15
8 8       <NA> 2019-12-15
ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81