1

Let's say I have a matrix or data frame with two columns:

    marker <- c("A1", "A2", "A2", "A3")  
    gene <- c("gene1", "gene2", "gene3", "gene4")  
    cbind(marker, gene)  

     marker gene   
[1,] "A1"   "gene1"
[2,] "A2"   "gene2"
[3,] "A2"   "gene3"
[4,] "A3"   "gene4"

How can I convert this into a matrix or data frame that has one row for each unique marker and all associated genes? Ideally, I would like to get something like this:

     marker gene          
[1,] "A1"   "gene1"       
[2,] "A2"   "gene2";"gene3"
[3,] "A3"   "gene4" 
milan
  • 4,782
  • 2
  • 21
  • 39

1 Answers1

3

What about this?

spl <- split(gene, marker)
data.frame(name = names(spl), gene = do.call(c, lapply(spl, function(x) paste0(x, collapse = ";"))))
   name        gene
A1   A1       gene1
A2   A2 gene2;gene3
A3   A3       gene4
DatamineR
  • 10,428
  • 3
  • 25
  • 45
  • 1
    The output is correct (+1) but I personally find `aggregate(gene ~ name, df, paste, collapse = "; ")` much easier to read and write. – talat Feb 25 '15 at 18:55
  • @docendodiscimus Sure, me too! :) This is just as an alternative. Was considering to delete it... – DatamineR Feb 25 '15 at 19:31