I want to combine date columns by the latest date per row (if it is different) but with keeping the ID column. My data frame looks like the image. I want to keep the NA rows. As you can see in some rows the timestamp_c is filled and not the timestamp (in some others, it is the opposite). I want to keep the column which is completed and not the NA. I tried to follow this but I could not find a solution
Asked
Active
Viewed 164 times
0
-
1You could try pmax(), ifelse() or coalesce(), see more details here: https://stackoverflow.com/questions/27850344/coalesce-two-string-columns-with-alternating-missing-values-to-one – Mario Nov 01 '21 at 13:56
-
2my PC does not suppert .NORM files yet.. https://xkcd.com/2116/ – Wimpel Nov 01 '21 at 14:05
1 Answers
0
library(data.table)
df <- data.table(
ID = LETTERS[1:7],
timestamp_c = lubridate::ymd("2021-03-08", NA, NA, "2021-04-06", NA, "2021-04-06", "2021-04-07"),
timestamp = lubridate::ymd(NA, NA, NA, "2021-04-06", "2021-05-05", "2021-04-07", "2021-04-06")
)
df[, new_timestamp := max(timestamp_c, timestamp, na.rm = TRUE), by = ID]
# ID timestamp_c timestamp new_timestamp
# 1: A 2021-03-08 <NA> 2021-03-08
# 2: B <NA> <NA> <NA>
# 3: C <NA> <NA> <NA>
# 4: D 2021-04-06 2021-04-06 2021-04-06
# 5: E <NA> 2021-05-05 2021-05-05
# 6: F 2021-04-06 2021-04-07 2021-04-07
# 7: G 2021-04-07 2021-04-06 2021-04-07

Merijn van Tilborg
- 5,452
- 1
- 7
- 22