I would like to explore the best way to melt
a data.table
with na.rm
applying only for the first element of the list of measure.vars
.
I have a data.table
as follows:
library(data.table)
library(lubridate)
dt.master <- data.table(user = seq(1,5),
visit_id = c(2,4,NA,4,8),
visit_date = c(dmy("10/02/2018"), dmy("11/04/2018"), NA, dmy("02/03/2018"), NA),
offer_id = c(1,3,NA,NA,NA),
offer_date = c(dmy("15/02/2018"), dmy("18/04/2018"), NA, NA, NA))
With dt.master
:
user visit_id visit_date offer_id offer_date
1: 1 2 2018-02-10 1 2018-02-15
2: 2 4 2018-04-11 3 2018-04-18
3: 3 NA <NA> NA <NA>
4: 4 4 2018-03-02 NA <NA>
5: 5 8 <NA> NA <NA>
I want to get, for each user, the "story" of commercial activity (that is: their visits and their offers).
dt.melted <- melt(dt.master,
id.vars = "user",
measure.vars = list(c("visit_id", "offer_id"), c("visit_date", "offer_date")),
variable.name = "level",
value.name = c("level_id", "level_date"))
With dt.melted
:
user level level_id level_date
1: 1 1 2 2018-02-10
2: 2 1 4 2018-04-11
3: 3 1 NA <NA>
4: 4 1 4 2018-03-02
5: 5 1 8 <NA>
6: 1 2 1 2018-02-15
7: 2 2 3 2018-04-18
8: 3 2 NA <NA>
9: 4 2 NA <NA>
10: 5 2 NA <NA>
However, I don't want NA
s to appear in the level_id
column, i.e:
user level level_id level_date
1: 1 1 2 2018-02-10
2: 2 1 4 2018-04-11
3: 4 1 4 2018-03-02
4: 5 1 8 <NA>
5: 1 2 1 2018-02-15
6: 2 2 3 2018-04-18
Unfortunately, the data quality of the sample is really bad, so level_date
is not always available. Thus, a na.rm = T
is not valid, as I would get:
dt.melted.na <- melt(dt.master,
id.vars = "user",
measure.vars = list(c("visit_id", "offer_id"), c("visit_date", "offer_date")),
variable.name = "level",
value.name = c("level_id", "level_date"),
na.rm = TRUE)
With dt.melted.na
:
user level level_id level_date
1: 1 1 2 2018-02-10
2: 2 1 4 2018-04-11
3: 4 1 4 2018-03-02
4: 1 2 1 2018-02-15
5: 2 2 3 2018-04-18
Is there a way to use na.rm = TRUE
only for the first element of the list in measure.vars
? I am currently exploring other workarounds (like filling visit_date
and offer_date
with "false" dates when visit_id
and offer_id
are available), but I would like to know if there is an elegant solution.