0

In a nested list with three levels, I need to reach the third level (i.e., resources) and check whether some special characters exist in the data. I tried to make a similar sample of data but there is still a difference I could not address. While level1 (foo) and level 2 (e.g., obj1) are lists, level 3 (resource) is list(s3:data.frame). I run the code below but as you can see there is an error when it comes to the second part (i.e., filter). Could you please tell me how to avoid this error?

    obj1 <- list(resource = list(bodyPart = c("leg", "arm", "knee"),side = "RIGHT", device = "SENS"))
    obj2 <- list(resource = list(bodyPart = c("leg", "arm", "knee"), side = "LEFT", device = "GOM"))
        
    x <- list(foo = obj1, bar = obj2)
        
Dat <- lapply(x, function(tb) tb[sapply(tb, function(z) any(grepl("[^\x01-\x7F]", z), na.rm = TRUE))]) %>% 
dplyr::filter(if_any(everything(), ~ grepl("[^\x01-\x7F]", .)))
        
        Error in UseMethod("filter") : 
          no applicable method for 'filter' applied to an object of class "list"
Rara
  • 105
  • 9
  • hello rara, if you provide an example of input data and expected output data it is easier to help you – Gerald T May 24 '23 at 11:40

1 Answers1

0

you could recursively scan the list nodes with rapply:

x |> rapply(f = \(node) grepl("[^\x01-\x7F]", node))

to generate a tibble from above:

library(tibble)

data.frame(has_invalid_character = x |> rapply(f = \(node) grepl("[^\x01-\x7F]", node)),
           content =  x |> rapply(f = \(node) node)
           ) |>
  rownames_to_column('item') |>
  as_tibble()
# A tibble: 10 x 3
   item                   has_invalid_character content
   <chr>                  <lgl>                 <chr>  
 1 foo.resource.bodyPart1 FALSE                 leg    
 2 foo.resource.bodyPart2 FALSE                 arm    
 3 foo.resource.bodyPart3 FALSE                 knee   
 4 foo.resource.side      FALSE                 RIGHT  
 5 foo.resource.device    FALSE                 SENS        
## etc.
I_O
  • 4,983
  • 2
  • 2
  • 15
  • Thank you! It basically does what I need. Is it possible to have a column including the values where these invalid characters exist? I mean when has_invalid_char equals TRUE, next to it I can see what the value is. – Rara May 24 '23 at 15:28
  • See edited answer please. If that answers your question, please remember to close the ticket my marking the answer as "accepted". – I_O May 24 '23 at 16:05
  • Many thanks for the solution. Do you mind explaining a bit what this chunk means? f = \(node) – Rara May 25 '23 at 08:08
  • 1
    Sure. `f = ...` specifies the function to be applied to each item found by recursively scanning the list `x`. `\(node) node` is a shorthand for `function(node) {return(node)}`. So what this chunk does is return every list item, regardless of nesting level. The other one `grepl`s every list item (same order, so you can just cbind both resulting vectors to form a dataframe). – I_O May 25 '23 at 08:16
  • Very clear. Thanks! and does it matter if we use %>% or |> sign? – Rara May 25 '23 at 08:31
  • For that purpose I'd say no. Both the (relatively) new base R pipe operator and the different flavours of the "tidyverse" operator have their pros and cons: https://stackoverflow.com/questions/67633022/what-are-the-differences-between-rs-new-native-pipe-and-the-magrittr-pipe, https://www.tidyverse.org/blog/2023/04/base-vs-magrittr-pipe/ . – I_O May 25 '23 at 08:40
  • moved discussion to [chat](https://chat.stackoverflow.com/rooms/253817/discussion-between-i-o-and-rara). – I_O May 25 '23 at 09:08
  • May I ask you to see if you can help with a follow-up question to this question? [link] (https://stackoverflow.com/questions/76637800/how-to-add-a-new-column-to-the-output-within-an-rapply-function) – Rara Jul 12 '23 at 12:17
  • it took me a while; see proposed anwer at indicated location, please – I_O Jul 16 '23 at 11:53