1

I have 2 data frames A and B of dimensions 2 x 5 like this:

 A = data.frame(GeneA1=-0.02:1.89, GeneB2=0.25:1.99, GeneB3=0.17:1.87, GeneB4=0.3:1.63, GeneC2=0.29:1.97, row.names=c("sample 1", "sample 2"))

 B = data.frame(GeneA1=0.52:-0.04, GeneB1=1.1:0.08, GeneB3=0.72:0.03, GeneB5=0.78:0.06, GeneC2=0.78:0.25, row.names=c("sample 1", "sample 2"))    

For both A & B, the rows are samples and the columns are gene type

I want to try and merge A & B using rbind, adding NAs where the gene types don't match up. I've heard there's a way to do this, using the setdiff argument but I don't know how?

Henrik
  • 65,555
  • 14
  • 143
  • 159
user2846211
  • 949
  • 6
  • 16
  • 24

3 Answers3

3

Use merge

> AB <- merge(A, B, all=TRUE)
> AB[,order(names(AB))]  # to get the result ordered by colnames 
  Gene A1 Gene B1 Gene B2 Gene B3 Gene B4 Gene B5 Gene C2
1   -0.04    0.08      NA    0.03      NA    0.06    0.25
2   -0.02      NA    0.25    0.17    0.30      NA    0.29
3    0.52    1.10      NA    0.72      NA    0.78    0.78
4    1.89      NA    1.99    1.87    1.63      NA    1.97

Where A and B are as follows:

A <- matrix(c(-0.02, 0.25, 0.17, 0.3, 0.29, 
              1.89, 1.99, 1.87, 1.63, 1.97), 
            nrow=2, byrow=TRUE,
            dimnames=list(NULL, c("Gene A1", "Gene B2", 
                                  "Gene B3", 
                                  "Gene B4", "Gene C2")))

B <- matrix(c(0.52, 1.1, 0.72, 0.78, 0.78, 
              -0.04, 0.08, 0.03, 0.06,0.25), 
            nrow=2, byrow=TRUE,
            dimnames=list(NULL, c("Gene A1", "Gene B1",
                                  "Gene B3", 
                                  "Gene B5", "Gene C2")))
Jilber Urbina
  • 58,147
  • 10
  • 114
  • 138
  • I've tried using merge but when I do this on bigger data frames, I often run into the error message: "Error in match.names(clabs, names(xi)) : names do not match previous names" So I was wondering if there was a way to do this using rbind? – user2846211 Nov 18 '13 at 15:03
  • @Jilber "By default the data frames are merged on the columns with names they both have". So if both dataframes have the same values on shared columns, they will get merged? Add `1.89, 999, 1.87, 999, 1.97` rows to both `A` and `B`, to see it in action. – zx8754 Nov 18 '13 at 15:21
1

You can use the function merge:

A=data.frame(A1=c(-0.02,1.89),B2=c(0.25,1.99),B3=c(0.17,1.87),B4=c(0.3,1.63),C2=c(0.29,1.97))
B=data.frame(A1=c(0.52,-0.04),B1=c(1.1,0.08),B3=c(0.72,0.03),B5=c(0.78,0.06),C2=c(0.78,0.25))
C<-merge(A, B, all=T)
View(C)
mdml
  • 22,442
  • 8
  • 58
  • 66
0

Try this:

# dummy data
A <- read.table(text="
Gene A1, Gene B2, Gene B3, Gene B4, Gene C2
0.52, 0.25, 0.17, 0.3, 0.29
1.89, 1.99, 1.87, 1.63, 1.97",
                sep=",", header=TRUE)
B <- read.table(text="
Gene A1, Gene B1, Gene B3, Gene B5, Gene C2
0.52, 1.1, 0.72, 0.78, 0.78
-0.04, 0.08, 0.03, 0.06,0.25",
                sep=",", header=TRUE)

#transpose and merge
tAB <- merge(t(A),t(B),by="row.names",all=TRUE)

#keep gene names
col <- tAB[,1]

#exclude rownames, transpose
output <- t(tAB[,-1])

#update colnames
colnames(output) <- col

#output
#     Gene.A1 Gene.B1 Gene.B2 Gene.B3 Gene.B4 Gene.B5 Gene.C2
#V1.x   -0.02      NA    0.25    0.17    0.30      NA    0.29
#V2.x    1.89      NA    1.99    1.87    1.63      NA    1.97
#V1.y    0.52    1.10      NA    0.72      NA    0.78    0.78
#V2.y   -0.04    0.08      NA    0.03      NA    0.06    0.25
zx8754
  • 52,746
  • 12
  • 114
  • 209