0

I want to combine date columns by the latest date per row (if it is different) but with keeping the ID column. My data frame looks like the image. I want to keep the NA rows. As you can see in some rows the timestamp_c is filled and not the timestamp (in some others, it is the opposite). I want to keep the column which is completed and not the NA. I tried to follow this but I could not find a solution

enter image description here

Swift Dev Journal
  • 19,282
  • 4
  • 56
  • 66
Meli
  • 33
  • 6
  • 1
    You could try pmax(), ifelse() or coalesce(), see more details here: https://stackoverflow.com/questions/27850344/coalesce-two-string-columns-with-alternating-missing-values-to-one – Mario Nov 01 '21 at 13:56
  • 2
    my PC does not suppert .NORM files yet.. https://xkcd.com/2116/ – Wimpel Nov 01 '21 at 14:05

1 Answers1

0
library(data.table)

df <- data.table(
  ID = LETTERS[1:7],
  timestamp_c = lubridate::ymd("2021-03-08", NA, NA, "2021-04-06", NA, "2021-04-06", "2021-04-07"),
  timestamp = lubridate::ymd(NA, NA, NA, "2021-04-06", "2021-05-05", "2021-04-07", "2021-04-06")
)

df[, new_timestamp := max(timestamp_c, timestamp, na.rm = TRUE), by = ID]

#    ID timestamp_c  timestamp new_timestamp
# 1:  A  2021-03-08       <NA>    2021-03-08
# 2:  B        <NA>       <NA>          <NA>
# 3:  C        <NA>       <NA>          <NA>
# 4:  D  2021-04-06 2021-04-06    2021-04-06
# 5:  E        <NA> 2021-05-05    2021-05-05
# 6:  F  2021-04-06 2021-04-07    2021-04-07
# 7:  G  2021-04-07 2021-04-06    2021-04-07
Merijn van Tilborg
  • 5,452
  • 1
  • 7
  • 22