0

I am trying to extract data from a nested tibble. Within the outer tibble, not all tibbles may exist or be complete. In case of an non-existing column I would like to return 0.

df <- tibble(a = tibble(iris),
             b = tibble(iris[1:2]),
             c = NULL)

now I'd like to extract the column 'species' from each nested tibble, where the generated column is filled with NA if no data are available. So that the result equals:

tibble(a_s = iris$Species, 
       b_s = NA, 
       c_s = NA)

Is there any way I could achieve this?

I naively tried:

transmute(df, a_s = a$species,
              b_s = b$species,
              c_s = c$species)

which of course only works for a_s, generates a warning for b_s and throws an error for c_s.

I have tried creating a helper function to evaluate the existence of each column, but this didn't work for nested dataframes. Any ideas on how to solve this?

UPDATE: for clarity, I always want to generate the output as specified, while tibble c may or may not be there.

Joost Keuskamp
  • 125
  • 1
  • 9

1 Answers1

1

Using grepl within ifelse to check for Species and do.call to get the final tibble.

library(dplyr)

do.call(tibble, sapply(c("a", "b", "c"), function(x)
  ifelse(any(grepl("Species", names(df[[x]]))), 
         df[[x]]["Species"], 
         NA_character_))) %>% 
  rename_with(~ paste0(.x, "_s"))
# A tibble: 150 × 3
   a_s    b_s   c_s  
   <fct>  <chr> <chr>
 1 setosa NA    NA   
 2 setosa NA    NA   
 3 setosa NA    NA   
 4 setosa NA    NA   
 5 setosa NA    NA   
 6 setosa NA    NA   
 7 setosa NA    NA   
 8 setosa NA    NA   
 9 setosa NA    NA   
10 setosa NA    NA   
# … with 140 more rows
# ℹ Use `print(n = ...)` to see more rows
Andre Wildberg
  • 12,344
  • 3
  • 12
  • 29