Is there an R function for collapsing characters into one cell if they have a matching character in another cell?

Question

I have a dataframe with two columns of characters that looks like this:

name	gene
GO:00001	Gene_1
GO:00001	Gene_2
GO:00002	Gene_3
GO:00002	Gene_4
GO:00002	Gene_5

But I need to collapse the columns so that the "name" column isn't repetitive and the "gene" column contains each gene that matches to the same "name", separated by a comma and a space, like so:

name	gene
GO:00001	Gene_1, Gene_2
GO:00002	Gene_3, Gene_4, Gene_5

I have looked into the documentation for melt, collapse, and summarize, but I can't figure out how to do this with characters. Any help is much appreciated!!

score 0 · Accepted Answer · answered Feb 06 '23 at 21:06

Using dplyr:

> df %>% 
    group_by(name) %>% 
    summarise(gene = paste0(gene, collapse = ","))
# A tibble: 2 × 2
  name     gene                
  <chr>    <chr>               
1 GO:00001 Gene_1,Gene_2       
2 GO:00002 Gene_3,Gene_4,Gene_5

Using R base

aggregate(gene ~ name, FUN= paste0, data=df)

Is there an R function for collapsing characters into one cell if they have a matching character in another cell?

1 Answers1