1

I have a dataframe with two columns of characters that looks like this:

name gene
GO:00001 Gene_1
GO:00001 Gene_2
GO:00002 Gene_3
GO:00002 Gene_4
GO:00002 Gene_5

But I need to collapse the columns so that the "name" column isn't repetitive and the "gene" column contains each gene that matches to the same "name", separated by a comma and a space, like so:

name gene
GO:00001 Gene_1, Gene_2
GO:00002 Gene_3, Gene_4, Gene_5

I have looked into the documentation for melt, collapse, and summarize, but I can't figure out how to do this with characters. Any help is much appreciated!!

LuLuGaGa
  • 13,089
  • 6
  • 49
  • 57

1 Answers1

0

Using dplyr:

> df %>% 
    group_by(name) %>% 
    summarise(gene = paste0(gene, collapse = ","))
# A tibble: 2 × 2
  name     gene                
  <chr>    <chr>               
1 GO:00001 Gene_1,Gene_2       
2 GO:00002 Gene_3,Gene_4,Gene_5
  

Using R base

aggregate(gene ~ name, FUN= paste0, data=df)
Jilber Urbina
  • 58,147
  • 10
  • 114
  • 138