1

I have about 100 data frames that are keyed by row.names. I need to merge them all into one table, but there is some missingingness so there is differing lengths. I have setup test data frames like so:

df1 = data.frame(row.names=c("chr1","chr2","chr3","chr4","chr5"),v1=c(10,43,1,44,598))
df2 = data.frame(row.names=c("chr1","chr2","chr4","chr5","chr6","chr7"),v2=c(6,64,21,98,10,20))
df3 = data.frame(row.names=c("chr2","chr3","chr4","chr5","chr6","chr7"),v3=c(20,30,40,50,60,70))

> df1
      v1
chr1  10
chr2  43
chr3   1
chr4  44
chr5 598
> df2
     v2
chr1  6
chr2 64
chr4 21
chr5 98
chr6 10
chr7 20
> df3
     v3
chr2 20
chr3 30
chr4 40
chr5 50
chr6 60
chr7 70

The desired output would be:

        v1  v2  v3
chr1    10  6   NA
chr2    43  64  20
chr3    1   NA  30
chr4    44  21  40
chr5    598 98  50
chr6    NA  10  60
chr7    NA  20  70


So some way to merge df1, df2, df3, ..., dfn.

bhffs
  • 123
  • 1
  • 6

1 Answers1

1

We could get all the datasets into a list and use merge with Reduce specifying the by as a new column created from the row names

lst1 <- lapply(mget(ls(pattern = '^df\\d+$')), \(x) 
          transform(x, rn =row.names(x)))
out <- Reduce(function(...) merge(..., by = 'rn', all = TRUE), 
        lst1)
row.names(out) <- out[[1]]
out <- out[-1]

-output

 out
      v1 v2 v3
chr1  10  6 NA
chr2  43 64 20
chr3   1 NA 30
chr4  44 21 40
chr5 598 98 50
chr6  NA 10 60
chr7  NA 20 70

Or using tidyverse with full_join after creating a row names column with rownames_to_column (from tibble)

library(dplyr)
library(tibble)
library(purrr)
mget(ls(pattern = '^df\\d+$')) %>%
    map(~ .x %>%
             rownames_to_column('rn')) %>% 
             reduce(full_join, by = 'rn') %>% 
    column_to_rownames("rn")
      v1 v2 v3
chr1  10  6 NA
chr2  43 64 20
chr3   1 NA 30
chr4  44 21 40
chr5 598 98 50
chr6  NA 10 60
chr7  NA 20 70
akrun
  • 874,273
  • 37
  • 540
  • 662