Join tibbles in list to one tibble

Question

I have a list of two data frames

a = list(
        mtcars %>% as_tibble() %>% select(-vs), 
        mtcars %>% as_tibble() %>% sample_n(17)
    )

and add a new column to the data sets by

b = a %>% 
    map(~ mutate(.x, class = floor(runif(nrow(.x), 0, 2)))) %>%
    map(~ nest(.x, -class))

Now I want to join the two list elements to one tibble based on class. Specifically, I am looking for a "smoother" solution than inner_join(pluck(b, 1), pluck(b, 2), "class") which gives the desired results but quickly gets messy if more data sets are involved in the list a.

You can pipe this into `reduce` with `left_join` or `inner_join`, joining by `class`. Similar to https://stackoverflow.com/q/48452421/5325862 — camille, May 15 '19 at 20:46
Ah maybe I should have made that clear. I with the above code I get a list with two tibbles in it. The tibbles have class as first column and tibbles as second column. I want to merge the tibbles in the list _but_ keep the tibbles in the tibbles unaffected (that is, I don't want to merge the tibbles in the tibbles but only the tibbles in the list) — Syd Amerikaner, May 15 '19 at 20:52
`inner_join(pluck(b, 1), pluck(b, 2), "class")` is what I want basically. But I thought that there may be a purrr/tidyverse solution that immediately combines the list elements. — Syd Amerikaner, May 15 '19 at 21:26
There might be, but it's hard to know without seeing your intended output. You can [edit] that into the question — camille, May 15 '19 at 21:35

score 3 · Accepted Answer · answered May 16 '19 at 01:56

This question is not super clear, but it seemed like there might be enough use cases to go for it. I added a few more data frames to a, constructed similarly, because the sample you used is too small to really see what you need to deal with.

library(tidyverse)

set.seed(123)
a <- list(
  mtcars %>% as_tibble() %>% select(-vs), 
  mtcars %>% as_tibble() %>% sample_n(17),
  mtcars %>% as_tibble() %>% slice(1:10),
  mtcars %>% as_tibble() %>% select(mpg, cyl, disp)
) 
# same construction of b as in the question

You can use purrr::reduce to carry out the inner_join call repeatedly, returning a single data frame of nested data frames. That's straightforward enough, but I couldn't figure out a good way to supply the suffix argument to the join, which assigns .x and .y by default to differentiate between duplicate column names. So you get these weird names:

b %>%
  reduce(inner_join, by = "class")
#> # A tibble: 2 x 5
#>   class data.x            data.y           data.x.x         data.y.y       
#>   <dbl> <list>            <list>           <list>           <list>         
#> 1     1 <tibble [11 × 10… <tibble [8 × 11… <tibble [3 × 11… <tibble [17 × …
#> 2     0 <tibble [21 × 10… <tibble [9 × 11… <tibble [7 × 11… <tibble [15 × …

You could probably deal with the names by creating something like data1, data2, etc before the reduce, but the quickest thing I decided on was replacing the suffixes with just the index of each data frame from the list b. A more complicated naming scheme would be a task for a different question.

b %>%
  reduce(inner_join, by = "class") %>%
  rename_at(vars(starts_with("data")), 
            str_replace, "(\\.\\w)+$", as.character(1:length(b))) %>%
  names()
#> [1] "class" "data1" "data2" "data3" "data4"

Join tibbles in list to one tibble

1 Answers1

Linked