1

Puh... even trying to frame the title properly already gives me a headache.

I have a config.yml with nested values and I would like to define an indexing function get_config() that accepts "path-like" value strings.

The "path entities" of the value string match the nested entity structure of the config file. Based on the path-like value the function should then go and grab the corresponding hierarchy entity (either "branches" or "leaves") from the config file.

Example

Suppose this is the structure of the config.yml:

default:
  column_names:
    col_id: "id"
    col_value: "value"
  column_orders:
    data_structure_a: [
      column_names/col_id,
      column_names/col_value
    ]
    data_structure_b: [
      column_names/col_value,
      column_names/col_id
    ]

Here's a parsed version for you to play around with:

x <- yaml::yaml.load(
'default:
  column_names:
    col_id: "id"
    col_value: "value"
  column_orders:
    data_structure_a: [
      column_names/col_id,
      column_names/col_value
    ]
    data_structure_b: [
      column_names/col_value,
      column_names/col_id
    ]'
)

Accessing top-level entities is easy with config::get(value):

config::get("column_names")
# $col_id
# [1] "id"
# 
# $col_value
# [1] "value"

config::get("column_orders")
# [1] "hello" "world"

But I would also like to access deeper entities, e.g. column_names: col_id.

In pseudo code:

config::get("column_names:col_id")

or

config::get("column_orders/data_structure_a")

The best I could come up with so far: relying on unlist()

get_config <- function(value, sep = ":") {
  if (value %>% stringr::str_detect(sep)) {
    value <- value %>% stringr::str_replace(sep, ".")
    configs <- config::get() %>% unlist()
    configs[value]
  } else {
    config::get(value)
  }
}

get_config("column_names")
# $col_id
# [1] "id"
#
# $col_value
# [1] "value"

get_config("column_names:col_id")
# column_names.col_id 
# "id" 

Though not elegant, it works for most use cases, but fails for unnamed list entities in the config file

get_config("column_orders:data_structure_a")
# <NA> 
#   NA 

as my indexing approach doesn't play well with the result of unlist() on unnamed lists:

config::get() %>% unlist()
# column_names.col_id          column_names.col_value 
# "id"                         "value" 
# column_orders.data_structure_a1 column_orders.data_structure_a2 
# "column_names/col_id"        "column_names/col_value" 
# column_orders.data_structure_b1 column_orders.data_structure_b2 
# "column_names/col_value"           "column_names/col_id" 

Thus, I'd like to "go recursive" but my brain says: "no way, dude"

Due diligence

This solution comes close (I guess).

But I keep thinking that I need something like purrr::map2_if() or purrr::pmap_if() (which AFAIK don't exist) instead of purrr::map_if(), as I need to not only traverse the list behind config::get() recursively, but also a listified version of value (e.g. via stringr::str_split(value, sep) %>% unlist() %>% as.list())?

Community
  • 1
  • 1
Rappster
  • 12,762
  • 7
  • 71
  • 120

2 Answers2

1

You could also use purrr::pluck to index into a nested list by name if that is what you are after:

x <- yaml::yaml.load('
  column_names:
    col_id: "id"
    col_value: "value"
  column_orders:
    data_structure_a: [
      column_names/col_id,
      column_names/col_value
    ]
    data_structure_b: [
      column_names/col_value,
      column_names/col_id
    ]
  nested_list:
    element_1:
      element_2:
        value: "hello world"
  ')

purrr::pluck(x, "column_names", "col_id")
#> [1] "id"

purrr::pluck(x, "column_names")
#> $col_id
#> [1] "id"
#> 
#> $col_value
#> [1] "value"

purrr::pluck(x, "column_orders", "data_structure_a")
#> [1] "column_names/col_id"    "column_names/col_value"

purrr::pluck(x, "column_names", "col_notthere")
#> NULL
Joris C.
  • 5,721
  • 3
  • 12
  • 27
0

I came up with a solution based on Recall().

However, while digging up the internet in trying to get here, I recall having read somewhere that Recall() is generally not a very (memory) efficient way of doing recursion in R? Also would appreciate additional hints on how to do recursion the tidy way with purrr and friends.

Config file content

Being able to call get_config() implies that you have a config.yml file with above content in your project's root directory given by here::here(), but you can test get_list_element_recursively() with this workaround:

x <- yaml::yaml.load('
  column_names:
    col_id: "id"
    col_value: "value"
  column_orders:
    data_structure_a: [
      column_names/col_id,
      column_names/col_value
    ]
    data_structure_b: [
      column_names/col_value,
      column_names/col_id
    ]
  nested_list:
    element_1:
      element_2:
        value: "hello world"
  ')

Function defs

get_config <- function(value, sep = "/") {
  get_list_element_recursively(
    config::get(),
    stringr::str_split(value, sep, simplify = TRUE)
  )
}

get_list_element_recursively <- function(
  lst,
  el,
  .el_trace = el,
  .level_trace = 1
) {
  # Reached leaf:
  if (!is.list(lst)) {
    return(lst)
  }

  # Element not in list:
  if (!(el[1] %in% names(lst))) {
    message("Current list branch:")
    # print(lst)
    message(str(lst))
    message("Trace of indexing vec (last element is invalid):")
    message(stringr::str_c(.el_trace[.level_trace], collapse = "/"))
    stop(stringr::str_glue("No such element in list: {el[1]}"))
  }

  lst <- lst[[ el[1] ]]

  if (!is.na(el[2])) {
    # Continue if there are additional elements in `el` vec
    Recall(lst, el[-1], .el_trace, .level_trace = 1:(.level_trace + 1))
  } else {
    # Otherwise return last indexing result:
    lst
  }
}

Testing get_config()

get_config("column_names")
# $col_id
# [1] "id"
#
# $col_value
# [1] "value"

get_config("column_names/col_id")
# [1] "id"

get_config("column_names/col_nonexisting")
# Current list branch:
#   List of 6
# $ col_id                    : chr "id"
# $ col_value                 : chr "value"
#
# Trace of indexing vec (last element is invalid):
#   column_names/col_nonexisting
# Error in get_list_element_recursively(config::get(), stringr::str_split(value,  :
#     No such element in list: col_nonexisting

get_config("column_orders")
# $data_structure_a
# [1] "column_names/col_id"    "column_names/col_value"
#
# $data_structure_b
# [1] "column_names/col_value" "column_names/col_id"

get_config("column_orders/data_structure_a")
# [1] "column_names/col_id"    "column_names/col_value"

Testing get_list_element_recursively()

get_list_element_recursively(x, c("column_names"))
# $col_id
# [1] "id"
#
# $col_value
# [1] "value"

get_list_element_recursively(x, c("column_names", "col_id"))
# [1] "id"

get_list_element_recursively(x, c("column_names", "col_notthere"))
# Current list branch:
#   List of 2
# $ col_id   : chr "id"
# $ col_value: chr "value"
#
# Trace of indexing vec (last element is invalid):
#   column_names/col_notthere
# Error in get_list_element_recursively(x$default, c("column_names", "col_notthere")) :
#   No such element in list: col_notthere
Rappster
  • 12,762
  • 7
  • 71
  • 120