0

I wanted to use this solution, two merge two data.tables by row name. It does however not work.

z <- matrix(c(0,0,1,1,0,0,1,1,0,0,0,0,1,0,1,1,0,1,1,1,1,0,0,0,"RND1","WDR", "PLAC8","TYBSA","GRA","TAF"), nrow=6,
    dimnames=list(c("ILMN_1651838","ILMN_1652371","ILMN_1652464","ILMN_1652952","ILMN_1653026","ILMN_1653103"),c("A","B","C","D","symbol")))

tt <-matrix(c("GO:0002009", 8, 342, 1, 0.07, 0.679, 0, 0, 1, 0, 
        "GO:0030334", 6, 343, 1, 0.07, 0.065, 0, 0, 1, 0,
        "GO:0015674", 7, 350, 1, 0.07, 0.065, 1, 0, 0, 0), nrow=10, dimnames= list(c("GO.ID","LEVEL","Annotated","Significant","Expected","resultFisher","ILMN_1652464","ILMN_1651838","ILMN_1711311","ILMN_1653026")))

z <- as.data.frame(z)
tt <- as.data.frame(tt)

setDT(z)
setDT(tt)

merge(tt,z["symbol"],by="row.names",all.x=TRUE)

I get the error:

Error in `[.data.table`(z, "symbol") : 
  When i is a data.table (or character vector), the columns to join by must be specified using 'on=' argument (see ?data.table), by keying x (i.e. sorted, and, marked as sorted, see ?setkey), or by sharing column names between x and i (i.e., a natural join). Keyed joins might have further speed benefits on very large data due to x being sorted in RAM.

How would this work in data.table?

aynber
  • 22,380
  • 8
  • 50
  • 63
Tom
  • 2,173
  • 1
  • 17
  • 44
  • 2
    `data.table` doesn't support row names (check `z` after `setDT`) so you'll need to add a column to use as the key) – George Savva Apr 06 '22 at 12:13
  • Your code works if you remove the two `setDT` calls. (I just learned that `base::merge` supports merging by row names.) – r2evans Apr 06 '22 at 12:29
  • 2
    Having said that, I have found the reliance on row names in a frame to be fragile: many (popular) R functions/packages (including `data.table` and `dplyr`) ignore or actively remove row names, so reliance on them is a risky venture. Because of this, the recommendation is almost always "convert to a proper column". – r2evans Apr 06 '22 at 12:31
  • As suggested move rownames to columns, then merge. Or keep matrix as matrix and merge based on *match*: [see another answer from your link](https://stackoverflow.com/a/6029852/680068) – zx8754 Apr 06 '22 at 12:35
  • 1
    Thanks everyone. I forgot that data.table does not support row names. I don't normally use them, not entirely sure anymore why I wanted to now. – Tom Apr 06 '22 at 12:44
  • I remember now why I wanted to use row names. I used `rowSums`, which worked nicely on a data.frame with row names, but not so much on the data.table with a character column. – Tom Apr 08 '22 at 05:42

1 Answers1

0

You can just merge two matrices with merge and the merged set will be a data.frame with the column name "Row.names". After that you can if desired make it a data.table.

merged <- merge(tt, z, by = "row.names", all = TRUE)

setDT(merged)

Or you can decide to convert the matrices to a data.table first and add the dimnames as a new column. Then merge those two data.tables.

merge(
  as.data.table(z)[, id := dimnames(z)[[1]]],
  as.data.table(tt)[, id := dimnames(tt)[[1]]],
  all = T
)
Merijn van Tilborg
  • 5,452
  • 1
  • 7
  • 22
  • Thank you for your answer Merijn, but the point was that I wanted to do it in data.table (which I should have maybe explained better). – Tom Apr 08 '22 at 05:43
  • my answer uses data.table... – Merijn van Tilborg Apr 08 '22 at 07:04
  • Besides that, I also always use data.table by preference. But if in your case you show matrices as starting point (if it were data.tables you would not have row names in the first place) you take the most efficient way to prepare your data. I gave both solutions, either merge as matrices and then convert it to a data.table. Or make them a data.table first and merge them. Being a data.table or not is in that step totally irrelevant. – Merijn van Tilborg Apr 08 '22 at 07:21