The merge()
function only takes two arguments to merge. Since you have three matrices, you have to call Reduce()
to cumulatively merge:
m1 <- matrix(c('tp53','apc','c1','c2'),2);
m2 <- matrix(c('tp53','col2a1','d1','d2'),2);
m3 <- matrix(c('tp53','wt1','e1','e2'),2);
m <- Reduce(function(x,y) merge(x,y,1,all=T),list(m1,m2,m3));
m;
## V1 V2.x V2.y V2
## 1 apc c2 <NA> <NA>
## 2 tp53 c1 d1 e1
## 3 col2a1 <NA> d2 <NA>
## 4 wt1 <NA> <NA> e2
It is not the design of merge()
to combine non-key columns, thus, as you can see, the c1/c2/d1/d2/e1/e2 values are still separated into separate (non-leftmost) columns in the merged object. You can solve this with another line of code (or you could combine the two lines into one, since m
is used only once on the RHS of this second line of code):
m <- as.data.frame(t(apply(m,1,function(x) na.omit(x)[1:length(x)])));
m;
## V1 V2 V3 V4
## 1 apc c2 <NA> <NA>
## 2 tp53 c1 d1 e1
## 3 col2a1 d2 <NA> <NA>
## 4 wt1 e2 <NA> <NA>
You may notice that the row order of m
does not follow the order in which key values occurred in the input matrices. I'm not sure exactly why this happened; it appears that merge()
can place unmatched rows (e.g. apc
) before matched rows (e.g. tp53
). A guaranteed row order is not part of the contract of merge()
. In any case, you can fix this with the following (row names can be fixed up afterward as well, if necessary, via row.names()
/rownames()
/dimnames()
):
m[match(m[,1],unique(c(m1[,1],m2[,1],m3[,1]))),];
## V1 V2 V3 V4
## 2 tp53 c1 d1 e1
## 1 apc c2 <NA> <NA>
## 3 col2a1 d2 <NA> <NA>
## 4 wt1 e2 <NA> <NA>
Notes:
- I haven't bothered messing with column names anywhere, since you haven't specified column names in your question. If necessary, you can set column names after-the-fact using a call to
names()
/setNames()
/colnames()
/dimnames()
.
- Funnily enough, although
merge()
accepts matrix inputs, it always spits out a data.frame, and although apply()
accepts data.frame inputs, it always spits out a matrix. I've added a final call to as.data.frame()
in the second line of code because you've specified you want a data.frame as the output, but you can remove that call to get a matrix as the final result.