Using rbind to merge data frames

Question

I have 2 data frames A and B of dimensions 2 x 5 like this:

 A = data.frame(GeneA1=-0.02:1.89, GeneB2=0.25:1.99, GeneB3=0.17:1.87, GeneB4=0.3:1.63, GeneC2=0.29:1.97, row.names=c("sample 1", "sample 2"))

 B = data.frame(GeneA1=0.52:-0.04, GeneB1=1.1:0.08, GeneB3=0.72:0.03, GeneB5=0.78:0.06, GeneC2=0.78:0.25, row.names=c("sample 1", "sample 2"))

For both A & B, the rows are samples and the columns are gene type

I want to try and merge A & B using rbind, adding NAs where the gene types don't match up. I've heard there's a way to do this, using the setdiff argument but I don't know how?

What do you mean by 'reproducible'? The example data I've provided illustrates my problem? — user2846211, Nov 18 '13 at 14:33
Please use R syntax in your code. What you show might be Matlab code? — Roland, Nov 18 '13 at 14:34
FYI info on posting on SO and reproducible examples can be seen here [link](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). Have you looked at merge() — B. Davis, Nov 18 '13 at 14:41
Similar question: http://stackoverflow.com/questions/6029743/r-merge-or-combine-by-rownames — zx8754, Nov 18 '13 at 15:10

score 3 · Answer 1 · answered Nov 18 '13 at 14:59

Use merge

> AB <- merge(A, B, all=TRUE)
> AB[,order(names(AB))]  # to get the result ordered by colnames 
  Gene A1 Gene B1 Gene B2 Gene B3 Gene B4 Gene B5 Gene C2
1   -0.04    0.08      NA    0.03      NA    0.06    0.25
2   -0.02      NA    0.25    0.17    0.30      NA    0.29
3    0.52    1.10      NA    0.72      NA    0.78    0.78
4    1.89      NA    1.99    1.87    1.63      NA    1.97

Where A and B are as follows:

A <- matrix(c(-0.02, 0.25, 0.17, 0.3, 0.29, 
              1.89, 1.99, 1.87, 1.63, 1.97), 
            nrow=2, byrow=TRUE,
            dimnames=list(NULL, c("Gene A1", "Gene B2", 
                                  "Gene B3", 
                                  "Gene B4", "Gene C2")))

B <- matrix(c(0.52, 1.1, 0.72, 0.78, 0.78, 
              -0.04, 0.08, 0.03, 0.06,0.25), 
            nrow=2, byrow=TRUE,
            dimnames=list(NULL, c("Gene A1", "Gene B1",
                                  "Gene B3", 
                                  "Gene B5", "Gene C2")))

I've tried using merge but when I do this on bigger data frames, I often run into the error message: "Error in match.names(clabs, names(xi)) : names do not match previous names" So I was wondering if there was a way to do this using rbind? — user2846211, Nov 18 '13 at 15:03
@Jilber "By default the data frames are merged on the columns with names they both have". So if both dataframes have the same values on shared columns, they will get merged? Add `1.89, 999, 1.87, 999, 1.97` rows to both `A` and `B`, to see it in action. — zx8754, Nov 18 '13 at 15:21

score 1 · Accepted Answer · edited Nov 18 '13 at 15:39

1

You can use the function merge:

A=data.frame(A1=c(-0.02,1.89),B2=c(0.25,1.99),B3=c(0.17,1.87),B4=c(0.3,1.63),C2=c(0.29,1.97))
B=data.frame(A1=c(0.52,-0.04),B1=c(1.1,0.08),B3=c(0.72,0.03),B5=c(0.78,0.06),C2=c(0.78,0.25))
C<-merge(A, B, all=T)
View(C)

edited Nov 18 '13 at 15:39

mdml

22,442
8
58
66

answered Nov 18 '13 at 15:18

Marie-Anne Frenken

26
1

See my comment to Jilber's answer. – zx8754 Nov 18 '13 at 15:30
1

@Marie-Anne Frenken what's the difference between your answer and mine?? Is it just a duplicated answer? – Jilber Urbina Nov 18 '13 at 15:43

score 0 · Answer 3 · answered Nov 18 '13 at 15:23

Try this:

# dummy data
A <- read.table(text="
Gene A1, Gene B2, Gene B3, Gene B4, Gene C2
0.52, 0.25, 0.17, 0.3, 0.29
1.89, 1.99, 1.87, 1.63, 1.97",
                sep=",", header=TRUE)
B <- read.table(text="
Gene A1, Gene B1, Gene B3, Gene B5, Gene C2
0.52, 1.1, 0.72, 0.78, 0.78
-0.04, 0.08, 0.03, 0.06,0.25",
                sep=",", header=TRUE)

#transpose and merge
tAB <- merge(t(A),t(B),by="row.names",all=TRUE)

#keep gene names
col <- tAB[,1]

#exclude rownames, transpose
output <- t(tAB[,-1])

#update colnames
colnames(output) <- col

#output
#     Gene.A1 Gene.B1 Gene.B2 Gene.B3 Gene.B4 Gene.B5 Gene.C2
#V1.x   -0.02      NA    0.25    0.17    0.30      NA    0.29
#V2.x    1.89      NA    1.99    1.87    1.63      NA    1.97
#V1.y    0.52    1.10      NA    0.72      NA    0.78    0.78
#V2.y   -0.04    0.08      NA    0.03      NA    0.06    0.25

Using rbind to merge data frames

3 Answers3