69

How can I merge the columns of two data frames, containing a distinct set of columns but some rows with the same names? The fields for rows that don't occur in both data frames should be filled with zeros:

> d
    a   b   c   d   e   f   g   h   i  j
1 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 10
2 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9  1
> e
   k  l  m  n  o  p  q  r  s  t
1 11 12 13 14 15 16 17 18 19 20
3 21 22 23 24 25 26 27 28 29 30
> de
    a   b   c   d   e   f   g   h   i  j  k  l  m  n  o  p  q  r  s  t
1 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 10 11 12 13 14 15 16 17 18 19 20
2 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9  1  0  0  0  0  0  0  0  0  0  0
3 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0  0 21 22 23 24 25 26 27 28 29 30
rcs
  • 67,191
  • 22
  • 172
  • 153
barbaz
  • 1,642
  • 2
  • 17
  • 27

2 Answers2

123

See ?merge:

the name "row.names" or the number 0 specifies the row names.

Example:

R> de <- merge(d, e, by=0, all=TRUE)  # merge by row names (by=0 or by="row.names")
R> de[is.na(de)] <- 0                 # replace NA values
R> de
  Row.names   a   b   c   d   e   f   g   h   i  j  k  l  m  n  o  p  q  r  s
1         1 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 10 11 12 13 14 15 16 17 18 19
2         2 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9  1  0  0  0  0  0  0  0  0  0
3         3 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0  0 21 22 23 24 25 26 27 28 29
   t
1 20
2  0
3 30
rcs
  • 67,191
  • 22
  • 172
  • 153
  • 1
    That does the job, thanks. Is there any way that is more performant? It takes minutes to do merge two columns with 200k rows, it even takes minutes to merge an empty dataframe with a single-column data frame that has 200k rows... – barbaz Oct 12 '11 at 13:45
  • 18
    and is there any way to preserve the row names? and not get them moved in a dedicated column? of course one can do `rownames(de)=de$Row.names` afterwards, just wondering if there is a way to not break it in the first place... – barbaz Oct 12 '11 at 14:44
  • what would not breaking it look like? – John Oct 12 '11 at 16:26
  • @John like `rownames(de)=de$Row.names; de=de[2:length(de)]` – barbaz Oct 13 '11 at 06:57
  • 3
    That's not possible with `merge`. See the help file: "The columns are the common columns followed by the remaining columns in x and then those in y." – rcs Oct 13 '11 at 07:18
0

Here's how I would do this with :

library(dplyr)

full_join(d |> rownames_to_column("id"),
          e |> rownames_to_column("id"), by="id") |> 
          mutate(across(everything(), ~replace_na(., 0)))

Output:

  id   a   b   c   d   e   f   g   h   i  j  k  l  m  n  o  p  q  r  s  t
1  1 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 10 11 12 13 14 15 16 17 18 19 20
2  2 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9  1  0  0  0  0  0  0  0  0  0  0
3  3 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0  0 21 22 23 24 25 26 27 28 29 30
Mark
  • 7,785
  • 2
  • 14
  • 34