55

In the example below I have two datasets (Z and A). I want to merge or combine these sets by the ILMN numbers. If there is no match, fill in NA.

z <- matrix(c(0,0,1,1,0,0,1,1,0,0,0,0,1,0,1,1,0,1,1,1,1,0,0,0,"RND1","WDR", "PLAC8","TYBSA","GRA","TAF"), nrow=6,
    dimnames=list(c("ILMN_1651838","ILMN_1652371","ILMN_1652464","ILMN_1652952","ILMN_1653026","ILMN_1653103"),c("A","B","C","D","symbol")))

t<-matrix(c("GO:0002009", 8, 342, 1, 0.07, 0.679, 0, 0, 1, 0, 
        "GO:0030334", 6, 343, 1, 0.07, 0.065, 0, 0, 1, 0,
        "GO:0015674", 7, 350, 1, 0.07, 0.065, 1, 0, 0, 0), nrow=10, dimnames= list(c("GO.ID","LEVEL","Annotated","Significant","Expected","resultFisher","ILMN_1652464","ILMN_1651838","ILMN_1711311","ILMN_1653026")))

The result will be like this:

             [,1]         [,2]         [,3]         [,4]
GO.ID        "GO:0002009" "GO:0030334" "GO:0015674"  NA
LEVEL        "8"          "6"          "7"           NA
Annotated    "342"        "343"        "350"         NA
Significant  "1"          "1"          "1"           NA
Expected     "0.07"       "0.07"       "0.07"        NA
resultFisher "0.679"      "0.065"      "0.065"       NA
ILMN_1652464 "0"          "0"          "1"           PLAC8
ILMN_1651838 "0"          "0"          "0"           RND1
ILMN_1711311 "1"          "1"          "0"           NA
ILMN_1653026 "0"          "0"          "0"           GRA
zx8754
  • 52,746
  • 12
  • 114
  • 209
Lisann
  • 5,705
  • 14
  • 41
  • 50

5 Answers5

71

Using merge and renaming your t vector as tt (see the PS of Andrie) :

merge(tt,z,by="row.names",all.x=TRUE)[,-(5:8)]

Now if you would work with dataframes instead of matrices, this would even become a whole lot easier :

z <- as.data.frame(z)
tt <- as.data.frame(tt)
merge(tt,z["symbol"],by="row.names",all.x=TRUE)
Joris Meys
  • 106,551
  • 31
  • 221
  • 263
44

Use match to return your desired vector, then cbind it to your matrix

cbind(t, z[, "symbol"][match(rownames(t), rownames(z))])

             [,1]         [,2]         [,3]         [,4]   
GO.ID        "GO:0002009" "GO:0030334" "GO:0015674" NA     
LEVEL        "8"          "6"          "7"          NA     
Annotated    "342"        "343"        "350"        NA     
Significant  "1"          "1"          "1"          NA     
Expected     "0.07"       "0.07"       "0.07"       NA     
resultFisher "0.679"      "0.065"      "0.065"      NA     
ILMN_1652464 "0"          "0"          "1"          "PLAC8"
ILMN_1651838 "0"          "0"          "0"          "RND1" 
ILMN_1711311 "1"          "1"          "0"          NA     
ILMN_1653026 "0"          "0"          "0"          "GRA"  

PS. Be warned that t is base R function that is used to transpose matrices. By creating a variable called t, it can lead to confusion in your downstream code.

Andrie
  • 176,377
  • 47
  • 447
  • 496
  • Your answer is very useful thank you. The only thing is that my code does not give the rigth output. If i only take this: z[, "symbol"][match(rownames(t), rownames(z))] a factor is created with NA and symbols but when i do cbind, the symbol number are replaced by a rondom value. Do anyone know that is wrong? Thanks – Lisann May 17 '11 at 11:19
3

Not perfect but close:

newcol<-sapply(rownames(t), function(rn){z[match(rn, rownames(z)), 5]})
cbind(data.frame(t), newcol)
Nick Sabbe
  • 11,684
  • 1
  • 43
  • 57
2
cbind.fill <- function(x, y){
  xrn <- rownames(x)
  yrn <- rownames(y)
  rn <- union(xrn, yrn)
  xcn <- colnames(x)
  ycn <- colnames(y)
  if(is.null(xrn) | is.null(yrn) | is.null(xcn) | is.null(ycn)) 
    stop("NULL rownames or colnames")
  z <- matrix(NA, nrow=length(rn), ncol=length(xcn)+length(ycn))
  rownames(z) <- rn
  colnames(z) <- c(xcn, ycn)
  idx <- match(rn, xrn)
  z[!is.na(idx), 1:length(xcn)] <- x[na.omit(idx),]
  idy <- match(rn, yrn)
  z[!is.na(idy), length(xcn)+(1:length(ycn))] <- y[na.omit(idy),]
  return(z)
}
Wojciech Sobala
  • 7,431
  • 2
  • 21
  • 27
2

you can wrap -Andrie answer into a generic function

mbind<-function(...){
 Reduce( function(x,y){cbind(x,y[match(row.names(x),row.names(y)),])}, list(...) )
}

Here, you can bind multiple frames with rownames as key

R.S.
  • 2,093
  • 14
  • 29