Merge data frames based on rownames in R

Question

How can I merge the columns of two data frames, containing a distinct set of columns but some rows with the same names? The fields for rows that don't occur in both data frames should be filled with zeros:

> d
    a   b   c   d   e   f   g   h   i  j
1 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 10
2 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9  1
> e
   k  l  m  n  o  p  q  r  s  t
1 11 12 13 14 15 16 17 18 19 20
3 21 22 23 24 25 26 27 28 29 30
> de
    a   b   c   d   e   f   g   h   i  j  k  l  m  n  o  p  q  r  s  t
1 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 10 11 12 13 14 15 16 17 18 19 20
2 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9  1  0  0  0  0  0  0  0  0  0  0
3 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0  0 21 22 23 24 25 26 27 28 29 30

rcs · Accepted Answer · 2014-09-19T13:25:58.120

123

See ?merge:

the name "row.names" or the number 0 specifies the row names.

Example:

R> de <- merge(d, e, by=0, all=TRUE)  # merge by row names (by=0 or by="row.names")
R> de[is.na(de)] <- 0                 # replace NA values
R> de
  Row.names   a   b   c   d   e   f   g   h   i  j  k  l  m  n  o  p  q  r  s
1         1 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 10 11 12 13 14 15 16 17 18 19
2         2 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9  1  0  0  0  0  0  0  0  0  0
3         3 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0  0 21 22 23 24 25 26 27 28 29
   t
1 20
2  0
3 30

edited Sep 19 '14 at 13:25

answered Oct 12 '11 at 12:02

rcs

67,191
22
172
153

1

That does the job, thanks. Is there any way that is more performant? It takes minutes to do merge two columns with 200k rows, it even takes minutes to merge an empty dataframe with a single-column data frame that has 200k rows... – barbaz Oct 12 '11 at 13:45
18

and is there any way to preserve the row names? and not get them moved in a dedicated column? of course one can do `rownames(de)=de$Row.names` afterwards, just wondering if there is a way to not break it in the first place... – barbaz Oct 12 '11 at 14:44
what would not breaking it look like? – John Oct 12 '11 at 16:26
@John like `rownames(de)=de$Row.names; de=de[2:length(de)]` – barbaz Oct 13 '11 at 06:57
3

That's not possible with `merge`. See the help file: "The columns are the common columns followed by the remaining columns in x and then those in y." – rcs Oct 13 '11 at 07:18

score 0 · Answer 2 · answered Aug 21 '23 at 02:48

Here's how I would do this with dplyr:

library(dplyr)

full_join(d |> rownames_to_column("id"),
          e |> rownames_to_column("id"), by="id") |> 
          mutate(across(everything(), ~replace_na(., 0)))

Output:

  id   a   b   c   d   e   f   g   h   i  j  k  l  m  n  o  p  q  r  s  t
1  1 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 10 11 12 13 14 15 16 17 18 19 20
2  2 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9  1  0  0  0  0  0  0  0  0  0  0
3  3 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0  0 21 22 23 24 25 26 27 28 29 30

Merge data frames based on rownames in R

2 Answers2

Linked

Related