1

I have these matrices of different dimensions. The key.related.sheet column in all matrices have some common and some uniques values. I want to match those common rows and merge all three matrices, but I also want to include unique rows as well. The result column should have key.related.sheet, Sample_B and trace_1,trace_2 and trace_3 columns only. Can someone please help me with this?

aa<-structure(c("S05-F13-P01:S05-F13-P01", "S05-F13-P01:S08-F10-P01", 
"S05-F13-P01:S08-F11-P01", "S05-F13-P01:S09-F66-P01", "S05-F13-P01", 
"S08-F10-P01", "S08-F11-P01", "S09-F66-P01", "1.25", "0.227", 
"-0.183", "-0.217"), .Dim = c(4L, 3L), .Dimnames = list(NULL, 
    c("key.related.sheet", "sample_B", "trace_1")))

bb<-structure(c("S05-F13-P01:S08-F10-P01", "S05-F13-P01:S08-F11-P01", 
"S05-F13-P01:S09-F66-P01", "S05-F13-P01:S09-F67-P01", "S08-F10-P01", 
"S08-F11-P01", "S09-F66-P01", "S09-F67-P01", "0.227", "-0.183", 
"-0.217", "0.292", "Unknown", "Unknown", "Unknown", "Unknown"
), .Dim = c(4L, 4L), .Dimnames = list(NULL, c("key.related.sheet", 
"sample_B", "trace_2", "type")))

cc<-structure(c("S05-F13-P01:S08-F11-P01", "S05-F13-P01:S09-F66-P01", 
"S05-F13-P01:S09-F67-P01", "S05-F13-P01:S09-F68-P01", "S05-F13-P01:S09-F01-P01", 
"S08-F11-P01", "S09-F66-P01", "S09-F67-P01", "S09-F68-P01", "S09-F01-P01", 
"-0.183", "-0.217", "0.292", "-0.314", "0.0418"), .Dim = c(5L, 
3L), .Dimnames = list(NULL, c("key.related.sheet", "sample_B", 
"trace_3")))

The expected output would be:

   key.related.sheet         sample_B      trace_1   trace_2    trace_3
 "S05-F13-P01:S05-F13-P01" "S05-F13-P01" "1.25"  
 "S05-F13-P01:S08-F10-P01" "S08-F10-P01" "0.227"      "0.227"
 "S05-F13-P01:S08-F11-P01" "S08-F11-P01" "-0.183"     "-0.183"    "-0.183"   
 "S05-F13-P01:S09-F66-P01" "S09-F66-P01" "-0.217"     "-0.217"    "-0.217"
 "S05-F13-P01:S09-F67-P01" "S09-F67-P01"              "0.292"     "0.292"
 "S05-F13-P01:S09-F68-P01" "S09-F68-P01"                          "-0.314"
 "S05-F13-P01:S09-F01-P01" "S09-F01-P01"                          "0.0418"
MAPK
  • 5,635
  • 4
  • 37
  • 88

4 Answers4

5

This can be done with the combination of Reduce and merge as follows:

Reduce(function(x, y) merge(x, y, all=TRUE), list(aa, bb[,-4], cc))

the result:

        key.related.sheet    sample_B trace_1 trace_2 trace_3
1 S05-F13-P01:S05-F13-P01 S05-F13-P01    1.25    <NA>    <NA>
2 S05-F13-P01:S08-F10-P01 S08-F10-P01   0.227   0.227    <NA>
3 S05-F13-P01:S08-F11-P01 S08-F11-P01  -0.183  -0.183  -0.183
4 S05-F13-P01:S09-F66-P01 S09-F66-P01  -0.217  -0.217  -0.217
5 S05-F13-P01:S09-F67-P01 S09-F67-P01    <NA>   0.292   0.292
6 S05-F13-P01:S09-F01-P01 S09-F01-P01    <NA>    <NA>  0.0418
7 S05-F13-P01:S09-F68-P01 S09-F68-P01    <NA>    <NA>  -0.314

Especially when you have more than three matrixes / dataframes, using merge with Reduce scales better then nested merges.

h3rm4n
  • 4,126
  • 15
  • 21
2

You can also do a full join using the merge method from base R with all = TRUE.

> merge(merge(aa,bb,all=TRUE),cc,all=TRUE)
        key.related.sheet    sample_B trace_1 trace_2    type trace_3
1 S05-F13-P01:S05-F13-P01 S05-F13-P01    1.25    <NA>    <NA>    <NA>
2 S05-F13-P01:S08-F10-P01 S08-F10-P01   0.227   0.227 Unknown    <NA>
3 S05-F13-P01:S08-F11-P01 S08-F11-P01  -0.183  -0.183 Unknown  -0.183
4 S05-F13-P01:S09-F66-P01 S09-F66-P01  -0.217  -0.217 Unknown  -0.217
5 S05-F13-P01:S09-F67-P01 S09-F67-P01    <NA>   0.292 Unknown   0.292
6 S05-F13-P01:S09-F01-P01 S09-F01-P01    <NA>    <NA>    <NA>  0.0418
7 S05-F13-P01:S09-F68-P01 S09-F68-P01    <NA>    <NA>    <NA>  -0.314

Here the merging is done w.r.t. all the common columns, i.e. key.related.sheet and sample_B - but this should be ok here, as sample_B depends on key.related.sheet?

Using by="key.related.sheet" you get the same output as in Adams answer using dplyr. Then the merging is done just w.r.t. key.related.sheet and the sample_B columns from the left and right join partner occur both in the result (i.e., are duplicated for your data)

Patrick Roocks
  • 3,129
  • 3
  • 14
  • 28
1

You could convert the matrices into data.frame and join them together with full_join command in dplyr package

library(dplyr)
for(i in c("aa","bb", "cc")) assign(i, data.frame(get(i)))
aa %>% full_join(bb, by="key.related.sheet") %>% full_join(cc,
by="key.related.sheet")

        key.related.sheet  sample_B.x trace_1  sample_B.y trace_2    type    sample_B trace_3
1 S05-F13-P01:S05-F13-P01 S05-F13-P01    1.25        <NA>    <NA>    <NA>        <NA>    <NA>
2 S05-F13-P01:S08-F10-P01 S08-F10-P01   0.227 S08-F10-P01   0.227 Unknown        <NA>    <NA>
3 S05-F13-P01:S08-F11-P01 S08-F11-P01  -0.183 S08-F11-P01  -0.183 Unknown S08-F11-P01  -0.183
4 S05-F13-P01:S09-F66-P01 S09-F66-P01  -0.217 S09-F66-P01  -0.217 Unknown S09-F66-P01  -0.217
5 S05-F13-P01:S09-F67-P01        <NA>    <NA> S09-F67-P01   0.292 Unknown S09-F67-P01   0.292
6 S05-F13-P01:S09-F68-P01        <NA>    <NA>        <NA>    <NA>    <NA> S09-F68-P01  -0.314
7 S05-F13-P01:S09-F01-P01        <NA>    <NA>        <NA>    <NA>    <NA> S09-F01-P01  0.0418
Adam Quek
  • 6,973
  • 1
  • 17
  • 23
1

Two nested merges and removal of the extraneous column

merge(merge(aa,bb[, -4], by=c("key.related.sheet", "sample_B") ,all=TRUE), 
      cc,  by=c("key.related.sheet", "sample_B") ,all=TRUE)

        key.related.sheet    sample_B trace_1 trace_2 trace_3
1 S05-F13-P01:S05-F13-P01 S05-F13-P01    1.25    <NA>    <NA>
2 S05-F13-P01:S08-F10-P01 S08-F10-P01   0.227   0.227    <NA>
3 S05-F13-P01:S08-F11-P01 S08-F11-P01  -0.183  -0.183  -0.183
4 S05-F13-P01:S09-F66-P01 S09-F66-P01  -0.217  -0.217  -0.217
5 S05-F13-P01:S09-F67-P01 S09-F67-P01    <NA>   0.292   0.292
6 S05-F13-P01:S09-F01-P01 S09-F01-P01    <NA>    <NA>  0.0418
7 S05-F13-P01:S09-F68-P01 S09-F68-P01    <NA>    <NA>  -0.314
IRTFM
  • 258,963
  • 21
  • 364
  • 487
  • 1
    Is this something different than @Patrick's answer? – Ronak Shah Apr 25 '16 at 06:17
  • The only difference was the removal of the extraneous column. Before I did that I was getting errors. I also didn't see Patrick's until after I posted. – IRTFM Apr 25 '16 at 06:27
  • Wouldn't it be more scalable (and readable) to use `Reduce()` instead of nested `merge()` calls? – mtoto Apr 25 '16 at 07:09
  • That's an excellent idea. I initially tried a `do.call` solution but it failed. Maybe _you_ should post a better solution? I think it has merit. – IRTFM Apr 25 '16 at 07:11