Collapse data frame by unique values and combine all related values of other variable

Question

Let's say I have a matrix or data frame with two columns:

    marker <- c("A1", "A2", "A2", "A3")  
    gene <- c("gene1", "gene2", "gene3", "gene4")  
    cbind(marker, gene)  

     marker gene   
[1,] "A1"   "gene1"
[2,] "A2"   "gene2"
[3,] "A2"   "gene3"
[4,] "A3"   "gene4"

How can I convert this into a matrix or data frame that has one row for each unique marker and all associated genes? Ideally, I would like to get something like this:

     marker gene          
[1,] "A1"   "gene1"       
[2,] "A2"   "gene2";"gene3"
[3,] "A3"   "gene4"

In the answers to the duplicate question, just change `collapse = ''` to `collapse = '; '`. — talat, Feb 25 '15 at 18:29

score 3 · Accepted Answer · answered Feb 25 '15 at 18:32

3

What about this?

spl <- split(gene, marker)
data.frame(name = names(spl), gene = do.call(c, lapply(spl, function(x) paste0(x, collapse = ";"))))
   name        gene
A1   A1       gene1
A2   A2 gene2;gene3
A3   A3       gene4

answered Feb 25 '15 at 18:32

DatamineR

10,428
3
25
45

1

The output is correct (+1) but I personally find `aggregate(gene ~ name, df, paste, collapse = "; ")` much easier to read and write. – talat Feb 25 '15 at 18:55
@docendodiscimus Sure, me too! :) This is just as an alternative. Was considering to delete it... – DatamineR Feb 25 '15 at 19:31

Collapse data frame by unique values and combine all related values of other variable

1 Answers1