0

With the data below, I want to expand the rows or each level of the IndID factor so that there are as many rows as there are years between the CptrDt and MortDt, including the starting and ending years. For individuals without a MortDt I hope to fill in the years sequential to 2017.

dat <- data.frame(IndID = c("AAA","BBB","CCC"),
                 CptrDt  = as.Date(c("01-01-2013" ,"01-01-2013", "01-01-2014"),"%m-%d-%Y"),
                 MortDt =  as.Date(c("01-01-2015" ,"01-01-2016", NA),"%m-%d-%Y"))

> dat
  IndID     CptrDt     MortDt
1   AAA 2013-01-01 2015-01-01
2   BBB 2013-01-01 2016-01-01
3   CCC 2014-01-01       <NA>

The simplified result would return only the year as shown below, but I can work with other date formats.

Result <- data.frame(IndID = c(rep("AAA",3), rep("BBB",4), rep("CCC",4)),
                 Year = c(2013,2014,2015,2013,2014,2015,2016,2014,2015,2016,2017))

   IndID Year
1    AAA 2013
2    AAA 2014
3    AAA 2015
4    BBB 2013
5    BBB 2014
6    BBB 2015
7    BBB 2016
8    CCC 2014
9    CCC 2015
10   CCC 2016
11   CCC 2017

I recognize this question is very similar to a previous post, but given the presence of NA values and slightly different data structure, I have not been able to produce the desired result with the previous response and would appreciate any suggestions. Moreover, as seen in the posted answers, there are additional solutions.

Community
  • 1
  • 1
B. Davis
  • 3,391
  • 5
  • 42
  • 78
  • 2
    You could use a list column or `do`: `library(tidyverse); dat %>% group_by(IndID) %>% mutate(MortDt = coalesce(MortDt, Sys.Date()), Year = seq(CptrDt, MortDt, by = 'year') %>% lubridate::year() %>% list()) %>% unnest()` – alistaire Feb 10 '17 at 04:53
  • or with `purrr::by_slice`: `dat %>% group_by(IndID) %>% mutate_if(lubridate::is.Date, coalesce, Sys.Date()) %>% by_slice(~seq(.x$CptrDt, .x$MortDt, by = 'year') %>% lubridate::year(), .collate = 'rows', .to = 'year')` – alistaire Feb 10 '17 at 05:04

1 Answers1

2

1- Using gsub, get the year from each row and form a sequence of it. Then use expand.grid to expand the value of IndID with the above sequence. Finally rbind the list of data frames into one data frame.

dat[is.na(dat$CptrDt), "CptrDt"] <- as.Date("01-01-2017", "%m-%d-%Y")
dat[is.na(dat$MortDt), "MortDt"] <- as.Date("01-01-2017", "%m-%d-%Y")

do.call('rbind', apply(dat, 1, function(x) {
                                             pattern <- '([0-9]{4})-[0-9]{2}-[0-9]{2}';
                                             y <- as.numeric( gsub( pattern, '\\1', x[2:3] ) );
                                             expand.grid( IndID = x[1], 
                                                          Year = seq( y[1], y[2], by = 1 ) )
                                            }))

#    IndID Year
# 1    AAA 2013
# 2    AAA 2014
# 3    AAA 2015
# 4    BBB 2013
# 5    BBB 2014
# 6    BBB 2015
# 7    BBB 2016
# 8    CCC 2014
# 9    CCC 2015
# 10   CCC 2016
# 11   CCC 2017

2- Using format based on the suggestion in the comment below.

dat[is.na(dat$CptrDt), "CptrDt"] <- as.Date("01-01-2017", "%m-%d-%Y")
dat[is.na(dat$MortDt), "MortDt"] <- as.Date("01-01-2017", "%m-%d-%Y")

dat$CptrDt <- format(dat$CptrDt, "%Y")
dat$MortDt <- format(dat$MortDt, "%Y")

do.call('rbind', apply(dat, 1, function(x) { expand.grid( IndID = x[1], 
                                                          Year = seq( as.numeric( x[2] ), as.numeric( x[3] ), by = 1 ) ) }))

Data:

dat <- data.frame(IndID = c("AAA","BBB","CCC"),
                  CptrDt  = as.Date(c("01-01-2013" ,"01-01-2013", "01-01-2014"),"%m-%d-%Y"),
                  MortDt =  as.Date(c("01-01-2015" ,"01-01-2016", NA),"%m-%d-%Y"))
Sathish
  • 12,453
  • 3
  • 41
  • 59