5

I apologize if this question is elementary, but I've been scouring the internet and I can't seem to find a simple solution.

I currently have a list of R objects (named vectors or dataframes of 1 variable, I can work with either), and I want to join them into 1 large dataframe with 1 row for each unique name/rowname, and 1 column for each element in the original list.

My starting list looks something like:

l1 <- list(df1 = data.frame(c(1,2,3), row.names = c("A", "B", "C")), 
       df2 = data.frame(c(2,6), row.names = c("B", "D")),
       df3 = data.frame(c(3,6,9), row.names = c("C", "D", "A")),
       df4 = data.frame(c(4,12), row.names = c("A", "E")))

And I want the output to look like:

data.frame("df1" = c(1,2,3,NA,NA),
+            "df2" = c(NA,2,NA,6,NA),
+            "df3" = c(9,NA,3,6,NA),
+            "df4" = c(4,NA,NA,NA,12), row.names = c("A", "B", "C", "D", "E"))
  df1 df2 df3 df4
A   1  NA   9   4
B   2   2  NA  NA
C   3  NA   3  NA
D  NA   6   6  NA
E  NA  NA  NA  12

I don't mind if the fill values are NA or 0 (ultimately I want 0 but that's an easy fix).

I'm almost positive that plyr::cbind.fill does exactly this, but I have been using dplyr in the rest of my script and I don't think using both is a good idea. dplyr::bind_cols does not seem to work with vectors of different lengths. I'm aware a very similar question has been asked here: R: Is there a good replacement for plyr::rbind.fill in dplyr? but as I mentioned, this solution doesn't actually seem to work. Neither does dplyr::full_join, even wrapped in a do.call. Is there a straightforward solution to this, or is the only solution to write a custom function?

Tom
  • 65
  • 1
  • 5
  • Not sure if you noticed, but since you put a vector inside `data.frame` with no name, you've got messy names like `c.1..2..3.` – camille Feb 05 '20 at 19:43
  • @camille I did notice that, this isn't the case in my actual data. The list elements start out as just named vectors, but I can easily coerce them to dataframes of 1 variable, and the column name ends up being identical to the corresponding list element name. – Tom Feb 05 '20 at 19:47

3 Answers3

7

We can convert the rownames to a column with rownames_to_column, then rename the second column, bind the list elements with bind_rows, and reshape to 'wide' with pivot_wider

library(dplyr)
library(tidyr)
library(purrr)
library(tibble)
map_dfr(l1, ~ rownames_to_column(.x, 'rn') %>% 
              rename_at(2, ~'v1'), .id = 'grp') %>%        
   pivot_wider(names_from = grp, values_from = v1) %>% 
   column_to_rownames('rn')
akrun
  • 874,273
  • 37
  • 540
  • 662
  • I'm not sure what I've done wrong or if it's just my institution's outdated version of RStudio, but I'm getting `Error in pivot_wider(., names_from = grp, values_from = v1) : could not find function "pivot_wider"` – Tom Feb 05 '20 at 19:54
  • 1
    @Tom Can you please check your `tidyr` version. It is available from 1.0.0 – akrun Feb 05 '20 at 19:56
  • 1
    I really like this approach! One minor thing: using `purrr::map_dfr()` you can skip the `bind_rows()` The line would look like this: `map_dfr(l1, ~ rownames_to_column(.x, 'rn') %>% 2, ~'v1'), .id = 'grp')` – Till Feb 05 '20 at 20:03
  • @Till Thank you. I forgot about that. – akrun Feb 05 '20 at 20:04
6

Here's a way with some purrr and dplyr functions. Create column names to represent each data frame—since each has only one column, this is easy with setNames, but with more columns you could use dplyr::rename. Do a full-join across the whole list based on the original row names, and fill NAs with 0.

library(dplyr)
library(purrr)

l1 %>%
  imap(~setNames(.x, .y)) %>%
  map(tibble::rownames_to_column) %>%
  reduce(full_join, by = "rowname") %>%
  mutate_all(tidyr::replace_na, 0)
#>   rowname df1 df2 df3 df4
#> 1       A   1   0   9   4
#> 2       B   2   2   0   0
#> 3       C   3   0   3   0
#> 4       D   0   6   6   0
#> 5       E   0   0   0  12
camille
  • 16,432
  • 18
  • 38
  • 60
2

Yet another purrr and dplyr option could be:

l1 %>%
 map2_dfr(.x = ., .y = names(.), ~ setNames(.x, .y) %>%
           rownames_to_column()) %>%
 group_by(rowname) %>%
 summarise_all(~ ifelse(all(is.na(.)), NA, first(na.omit(.))))

  rowname   df1   df2   df3   df4
  <chr>   <dbl> <dbl> <dbl> <dbl>
1 A           1    NA     9     4
2 B           2     2    NA    NA
3 C           3    NA     3    NA
4 D          NA     6     6    NA
5 E          NA    NA    NA    12
tmfmnk
  • 38,881
  • 4
  • 47
  • 67