Efficient way of subsetting nested list conditionally in R

Question

I have a large set of large named nested lists. Names of first level are variable, while the second levels are named according to some rules (examples provided below).

An example of the correct list is given below (x).

x <- list(`first-group` = list(val = c(534L, 582L, 298L, 645L, 314L, 
237L, 418L, 348L, 363L, 133L, 493L, 721L, 722L, 210L, 467L, 474L, 
145L, 638L, 545L, 330L, 709L, 712L, 674L, 492L, 262L, 663L, 609L, 
142L, 428L, 254L), co = c(0L, 0L, 1L, 1L, 0L, 1L, 0L, 0L, 1L, 
1L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 
1L, 1L, 1L, 1L, 0L)), `second-group` = list(val = c(505L, 647L, 
88L, 208L, 801L, 258L, 423L, 83L, 565L, 62L, 118L, 804L, 458L, 
357L, 327L, 138L, 586L, 340L, 473L, 335L, 720L, 170L, 159L, 207L, 
113L, 532L, 526L, 529L, 760L, 116L, 712L, 134L, 214L, 697L, 100L, 
123L, 227L, 411L, 285L, 659L, 379L, 775L, 176L), co = c(0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)), 
    `third-group` = list(val = c(713L, 721L, 683L, 526L, 699L, 
    555L, 563L, 672L, 619L, 603L, 588L, 533L, 622L, 724L, 616L, 
    644L, 730L, 716L, 660L, 663L, 611L, 669L, 644L, 664L, 679L, 
    514L, 579L, 525L, 533L, 541L, 530L, 564L, 584L, 673L, 592L, 
    726L, 548L, 563L, 727L, 646L, 708L, 557L, 586L, 592L, 693L, 
    620L, 548L, 705L, 510L, 677L, 539L, 603L, 726L, 525L, 597L, 
    563L, 712L), co = c(0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0)), `fourth-group` = list(val = c(142L, 317L, 
    286L, 174L, 656L, 299L, 676L, 206L, 645L, 755L, 514L, 424L, 
    719L, 741L, 711L, 552L, 550L, 372L, 551L, 520L, 650L, 503L, 
    667L, 162L, 644L, 595L, 322L, 247L), co = c(0L, 0L, 0L, 0L, 
    1L, 0L, 1L, 1L, 1L, 0L, 1L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 
    1L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 1L)))

Bespoke lists are produced from datasets which may contain some errors. Since the lists are large, it is hard to spot the errors. The structure of such an erroneous lists is preserved, although some variables are of wrong type (e.g. character or NA instead of numeric).

An example of wrong list is also given below (wrong_x).

wrong_x <- list(`first-group` = list(val = "this/is/character/variable", 
    co = c(0L, 0L, 1L, 1L, 0L, 1L, 0L, 0L, 1L, 1L, 0L, 0L, 1L, 
    0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 1L, 1L, 1L, 
    1L, 0L)), `second-group` = list(val = c(505L, 647L, 88L, 
208L, 801L, 258L, 423L, 83L, 565L, 62L, 118L, 804L, 458L, 357L, 
327L, 138L, 586L, 340L, 473L, 335L, 720L, 170L, 159L, 207L, 113L, 
532L, 526L, 529L, 760L, 116L, 712L, 134L, 214L, 697L, 100L, 123L, 
227L, 411L, 285L, 659L, 379L, 775L, 176L), co = c(0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)), `third-group` = list(
    val = c(713L, 721L, 683L, 526L, 699L, 555L, 563L, 672L, 619L, 
    603L, 588L, 533L, 622L, 724L, 616L, 644L, 730L, 716L, 660L, 
    663L, 611L, 669L, 644L, 664L, 679L, 514L, 579L, 525L, 533L, 
    541L, 530L, 564L, 584L, 673L, 592L, 726L, 548L, 563L, 727L, 
    646L, 708L, 557L, 586L, 592L, 693L, 620L, 548L, 705L, 510L, 
    677L, 539L, 603L, 726L, 525L, 597L, 563L, 712L), co = c(0, 
    0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)), `fourth-group` = list(
    val = NA, co = c(0L, 0L, 0L, 0L, 1L, 0L, 1L, 1L, 1L, 0L, 
    1L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 0L, 0L, 
    0L, 1L, 1L)))

ALso it might happen that the entire list has wrong variable types in sublists of interest - as in below example:

wrong2_x <- list(`first-group` = list(val = "this/is/character/variable", 
    co = c(0L, 0L, 1L, 1L, 0L, 1L, 0L, 0L, 1L, 1L, 0L, 0L, 1L, 
    0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 1L, 1L, 1L, 
    1L, 0L)), `second-group` = list(val = "this/is/character/variable/too", co = c(0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)), `third-group` = list(
    val = "and/this", co = c(0, 
    0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)), `fourth-group` = list(
    val = NA, co = c(0L, 0L, 0L, 0L, 1L, 0L, 1L, 1L, 1L, 0L, 
    1L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 0L, 0L, 
    0L, 1L, 1L)))

I wrote a simple function which mimicks my workflow. It contains filtering based on "$val" sublists (whether they contain numeric or not). If the resulting prefiltered list would be empty, the workflow should instantly stop and throw an error. The code is provided below:

my_function <- function(input_list){
  # data prefiltering
  input_list <- Filter(function(x) is.numeric(x$val), input_list)
  
  # condition
  if (length(input_list) == 0){
    stop("Better Call Saul.", call. =FALSE)
  } else {
    # there shall be other data wrangling functions below is just a dummy assignment
    output_list <- input_list
  }
  return(output_list)
}

Is there a more elegant (code-efficient) way to achieve the same result?

What exactly is inelegant or code-inefficient with your solution? Are you having performance issues? — MrFlick, Dec 12 '22 at 21:50
@MrFlick I am trying to make my code as fast as possible, since it has to deal with large datasets in reasonable amount of time. On local computer I do have performance issues, while on server do not (however my PC is not a deamon of speed). — ramen, Dec 12 '22 at 22:16

score 1 · Accepted Answer · answered Dec 12 '22 at 21:59

1

package purrr helps with list manipulation, example:

library(purrr)

is_faulty_list <- function(the_list){
    the_list |> 
        map('val') |> ## pluck list members named 'val'
        discard(~ is.numeric(.x)) |> ## keep only not numeric items
        length() ## should be zero (if only numeric items)
}

if(is_faulty_list(x)) print('calling Saul')

#> if(is_faulty_list(wrong_x)) print('calling Saul')
#[1] "calling Saul"

answered Dec 12 '22 at 21:59

I_O

4,983
2
2
15

1

may I use a dplyr's pipe operator in this case instead of R's native? I know that (I) they do not always work the same (II) backcompatibility is an issue for me (R <4.0). – ramen Dec 12 '22 at 22:19
1

In that case, I think you better use dplyr's one. Now that you mention it I remember replacing all base pipe operators with the older ones when I had to run a script on another server. AFAIK replacing `|>` with `%>%` is safe (unlike the other way round, as `|>` does not interpret the dot `.` as incoming data. https://stackoverflow.com/questions/54815607/r-combinations-with-dot-and-pipe-operator – I_O Dec 12 '22 at 23:09

Efficient way of subsetting nested list conditionally in R

1 Answers1