1

There is probably a simple solution for this but I've been struggling to get this, so I appreciate any help.

I have a variable called "event" which is binominal, and the date column indicates the time of event. I need to make a variable based on date and date.2 that will have values if date and if date was missing use date.2 values. This is the date with my final outcome. Please help me how to write a code for it.

df <- structure(list(X = c(11L, 6L, 12L, 8L, 9L, 10L, 1L, 3L, 5L, 7L, 
2L, 4L), date = c("1/2/1999", NA, NA, NA, "9/1/1999", NA, "12/7/1999", 
NA, NA, NA, "1/31/1999", NA), event = c(1L, 0L, 0L, 0L, 1L, 0L, 
1L, 0L, 0L, 0L, 1L, 0L), date.2 = c("12/10/1999", "12/9/1999", 
"11/22/1999", "9/10/1999", "10/25/1999", "12/14/1999", "9/17/1999", 
"9/23/1999", "9/21/1999", "9/26/1999", "12/17/1999", "10/22/1999"
), final_date = c("1/2/1999", "12/9/1999", "11/22/1999", "9/10/1999", 
"9/1/1999", "12/14/1999", "12/7/1999", "9/23/1999", "9/21/1999", 
"9/26/1999", "1/31/1999", "10/22/1999")), class = "data.frame", row.names = c(NA, -12L))

enter image description here

Peace Wang
  • 2,399
  • 1
  • 8
  • 15
Ali Roghani
  • 495
  • 2
  • 7
  • 1
    Try `df %>% mutate(final_date = coalesce(date, date.2))` – Karthik S Apr 04 '21 at 16:34
  • 1
    Does this answer your question? [How to implement coalesce efficiently in R](https://stackoverflow.com/questions/19253820/how-to-implement-coalesce-efficiently-in-r) – Karthik S Apr 04 '21 at 16:35

1 Answers1

2

Update: I have added a second version which applies this method if the columns were dates.

The solution I would use to do this is to convert the data to using a tibble and use the mutate function. Using mutate we can create a new column called date_final which uses ifelse to check if date == NA (check using the is.na() function)then we use date2 otherwise use date.

I have shown this below for this example

library(tidyverse)

df <- structure(list(X = c(11L, 6L, 12L, 8L, 9L, 10L, 1L, 3L, 5L, 7L, 
                     2L, 4L), date = c("1/2/1999", NA, NA, NA, "9/1/1999", NA, "12/7/1999", 
                                       NA, NA, NA, "1/31/1999", NA), event = c(1L, 0L, 0L, 0L, 1L, 0L, 
                                                                               1L, 0L, 0L, 0L, 1L, 0L), date.2 = c("12/10/1999", "12/9/1999", 
                                                                                                                   "11/22/1999", "9/10/1999", "10/25/1999", "12/14/1999", "9/17/1999", 
                                                                                                                   "9/23/1999", "9/21/1999", "9/26/1999", "12/17/1999", "10/22/1999"
                                                                               )), class = "data.frame", row.names = c(NA, 
                                                                                                                                                                               -12L))

# Use tibble for dataframe
df %>% 
  tibble %>% 
# use ifelse(condition, true, false) which is equivalent to an if else loop over the rows
  mutate(date_final = ifelse(is.na(date), date.2, date))
#> # A tibble: 12 x 5
#>        X date      event date.2     date_final
#>    <int> <chr>     <int> <chr>      <chr>     
#>  1    11 1/2/1999      1 12/10/1999 1/2/1999  
#>  2     6 <NA>          0 12/9/1999  12/9/1999 
#>  3    12 <NA>          0 11/22/1999 11/22/1999
#>  4     8 <NA>          0 9/10/1999  9/10/1999 
#>  5     9 9/1/1999      1 10/25/1999 9/1/1999  
#>  6    10 <NA>          0 12/14/1999 12/14/1999
#>  7     1 12/7/1999     1 9/17/1999  12/7/1999 
#>  8     3 <NA>          0 9/23/1999  9/23/1999 
#>  9     5 <NA>          0 9/21/1999  9/21/1999 
#> 10     7 <NA>          0 9/26/1999  9/26/1999 
#> 11     2 1/31/1999     1 12/17/1999 1/31/1999 
#> 12     4 <NA>          0 10/22/1999 10/22/1999

Created on 2021-04-04 by the reprex package (v2.0.0)

Let me know if this needs modifying due to a misunderstanding or you neeed clarification on any functions.

In order to apply this method to dates we can apply an identical method but we need to convert the date_final column from a dbl to a date. I use the lubridate function as_date() to do this.

library(tidyverse)
library(lubridate)

df <- structure(list(X = c(11L, 6L, 12L, 8L, 9L, 10L, 1L, 3L, 5L, 7L, 
                           2L, 4L), date = c("1/2/1999", NA, NA, NA, "9/1/1999", NA, "12/7/1999", 
                                             NA, NA, NA, "1/31/1999", NA), event = c(1L, 0L, 0L, 0L, 1L, 0L, 
                                                                                     1L, 0L, 0L, 0L, 1L, 0L), date.2 = c("12/10/1999", "12/9/1999", 
                                                                                                                         "11/22/1999", "9/10/1999", "10/25/1999", "12/14/1999", "9/17/1999", 
                                                                                                                         "9/23/1999", "9/21/1999", "9/26/1999", "12/17/1999", "10/22/1999"
                                                                                     )), class = "data.frame", row.names = c(NA, 
                                                                                                                             -12L))
df<- df %>% 
  tibble

df$date <- df$date %>% as.Date(format = "%m/%d/%y")
df$date.2 <- df$date.2 %>% as.Date(format = "%m/%d/%y")

df
#> # A tibble: 12 x 4
#>        X date       event date.2    
#>    <int> <date>     <int> <date>    
#>  1    11 2019-01-02     1 2019-12-10
#>  2     6 NA             0 2019-12-09
#>  3    12 NA             0 2019-11-22
#>  4     8 NA             0 2019-09-10
#>  5     9 2019-09-01     1 2019-10-25
#>  6    10 NA             0 2019-12-14
#>  7     1 2019-12-07     1 2019-09-17
#>  8     3 NA             0 2019-09-23
#>  9     5 NA             0 2019-09-21
#> 10     7 NA             0 2019-09-26
#> 11     2 2019-01-31     1 2019-12-17
#> 12     4 NA             0 2019-10-22

df <- df %>% 
  mutate(date_final = ifelse(is.na(date), date.2, date))

df$date_final <- df$date_final %>% as_date()

df
#> # A tibble: 12 x 5
#>        X date       event date.2     date_final
#>    <int> <date>     <int> <date>     <date>    
#>  1    11 2019-01-02     1 2019-12-10 2019-01-02
#>  2     6 NA             0 2019-12-09 2019-12-09
#>  3    12 NA             0 2019-11-22 2019-11-22
#>  4     8 NA             0 2019-09-10 2019-09-10
#>  5     9 2019-09-01     1 2019-10-25 2019-09-01
#>  6    10 NA             0 2019-12-14 2019-12-14
#>  7     1 2019-12-07     1 2019-09-17 2019-12-07
#>  8     3 NA             0 2019-09-23 2019-09-23
#>  9     5 NA             0 2019-09-21 2019-09-21
#> 10     7 NA             0 2019-09-26 2019-09-26
#> 11     2 2019-01-31     1 2019-12-17 2019-01-31
#> 12     4 NA             0 2019-10-22 2019-10-22

Created on 2021-04-04 by the reprex package (v2.0.0)

Joel Kandiah
  • 1,465
  • 5
  • 15