Concatenation of rows where first item match

Question

Let's say I have a file with two columns labeled A and B. Each column consists of different strings, with repetition allowed. The A column is already sorted. Here is an example:

A       B
c1045   GO:0003735
c1045   GO:0005829
c1045   GO:0005840
c1045   GO:0006412
c1045   GO:0019843
c11467  GO:0003735
c11467  GO:0005840
c11467  GO:0006412
c1168   GO:0006950
c1168   GO:0006950
c1175   GO:0003674
c1175   GO:0003729
c1175   GO:0003735
c1175   GO:0006412

I want to create a new file where each string in the A column will appear only once with the corresponding strings concatenated in the B column.

The resulting file will begin with:

A       B
c1045   GO:0003735,GO:0005829,GO:0005840,GO:0006412,GO:0019843.
c11467  GO:0003735,GO:0005840,GO:0006412.

Is there an easy way to do so in R ?

Sorry I didn't have the good keyword: aggregate and not concatenate... — bela83, Feb 23 '15 at 22:39

score 3 · Accepted Answer · answered Feb 23 '15 at 22:28

Is this what you are looking for?

library(data.table)
dt <- data.table(df)
##
R> dt[,lapply(.SD,function(x) {
    paste0(x,collapse=",")
  }),by=A]
        A                                                      B
1:  c1045 GO:0003735,GO:0005829,GO:0005840,GO:0006412,GO:0019843
2: c11467                       GO:0003735,GO:0005840,GO:0006412
3:  c1168                                  GO:0006950,GO:0006950
4:  c1175            GO:0003674,GO:0003729,GO:0003735,GO:0006412

Data:

df <- read.table(
  text="A       B
c1045   GO:0003735
c1045   GO:0005829
c1045   GO:0005840
c1045   GO:0006412
c1045   GO:0019843
c11467  GO:0003735
c11467  GO:0005840
c11467  GO:0006412
c1168   GO:0006950
c1168   GO:0006950
c1175   GO:0003674
c1175   GO:0003729
c1175   GO:0003735
c1175   GO:0006412",
  header=TRUE,
  stringsAsFactors=F)

Looks like it ! I upvote for now and will only accept tomorrow because I can't test right now... — bela83, Feb 23 '15 at 22:36
Good. Bonus point : it made me learn how to install a R package. I upvote ! — bela83, Feb 24 '15 at 08:27

Concatenation of rows where first item match

1 Answers1