I am looking for assistance with the following problem.
I have a data.table with a hierarchical identifier consisting of five levels. The format is [level_1]-[level_2]-[level_3].[level_4].[level_5]
(note the hyphen delimiting level 1,2 and 3 versus the period between 3,4, and 5). I have made this slightly smaller than my real data (which has 6 levels) -- I assume the solution for this will work the same.
Some toy data:
my_dt = setDT(data.frame("level_1" = c("H111", "H111", "H111", "H222", "H222", "H333"), "level_2" = c("a12", "a12", "b12", "c12", "c12", NA),
"level_3" = c("B9", "B2", "G1", NA, "F4", NA),
"level_4" = c("11", NA, "55", NA, "66", NA),
"level_5" = c("A", NA, "B", NA, NA, NA),
"key_level" = c(5,3,5,2,4,1)))
For example, the above identifiers when combined should look as follows (if they are being reported at their 'key-level' (or hierarchy depth, if you like).
full_key = c("H111-a12-B9.11.A", "H111-a12-B2", "H111-b12-G1.55.B", "H222-c12", "H222-c12-F4.66", "H333")
I'd like some help writing a function that takes a desired hierarchy depth (hlevel
), and reports the key to that specific depth (only if the key_level
is >= the hlevel
, otherwise return NA).
For example, if I want to report the examples above to level 3, I'd like to get the following result:
to_level_3 = c("H111-a12-B9", "H111-a12-B2", "H111-b12-G1", NA, "H222-c12-F4", NA)
(As you can see, all rows that had a key_level < 3 should be NA).
I'm struggling to get this to work. This is my attempt (may be more confusing than actually reading the problem!). This solution only sets up the start of the key where the delimiter is ".", and fails to return NA for those rows with a key_level
< than the hlevel
...
get_level_x_key = function(dt, hlevel)
{
my_columns = paste0("level_", 1:4)
my_first_max = min(hlevel, 2) # first part of key is delimited with hypthens
my_dt[,
#do.call(paste, c(
do.call(paste, c(.SD[key_level >= hlevel, my_columns[1:my_first_max], with = FALSE], list(sep="."))),]
#safe.ifelse(hlevel > 2, do.call(paste, c(.SD[, my_columns[2:hlevel], with = FALSE], list(sep="-"))), NA)), list(sep="-")]
#)]
}
get_level_x_key(my_dt, 3)
# Gives: [1] "H111-a12" "H111-a12" "H111-b12" "H222-c12"
I have read the SO question: paste two data.table columns, which made me attempt to use do.call -- but I don't know how to deal with the change in delimiter or how to return NA if the key_level is less than the required level. I had thought maybe the Reduce
function would be useful... but I am a bit lost.
I would appreciate any guidance, I am a bit stuck with how to break this problem down. Thanks!