merge the rows in R with the same row name concatenating the content in the column

Question

I need help merging the rows with the same name by concatenating the content in one of the columns. For example, in my dataframe,df, the rows with the same name match completely across the columns except in col 3. I want to merge the rows with the same rowname and concatenate the contents in col3 separated by a comma and get the result as shown below. Thank you for your help.

df

rowname  col1    col2  col3 
pat      122       A      T
bus      222       G      C
pat      122       A      G


result

rowname  col1    col2  col3 
pat      122       A      T,G
bus      222       G      C

I mean I just want to merge all the column as they are same if the rowname matches . Only the contents in col3 differ. — MAPK, Jun 24 '15 at 09:00
Sorry I just saw that there is a typo in col1. correction: 122 is in both 1 and third rows of col1 . Thanks! — MAPK, Jun 24 '15 at 09:04
Can you update with an example using `dput` i..e `dput(droplevels(head(yourdata,20)))` Copy/paste the output of the dput in your post — akrun, Jun 24 '15 at 09:18

score 2 · Answer 1 · answered Jun 24 '15 at 09:04

2

Try

aggregate(col3~., df, FUN=toString)
#   rowname col1 col2 col3
#1     pat  122    A T, G
#2     bus  222    G    C

Or using dplyr

library(dplyr)
 df %>%
    group_by_(.dots=names(df)[1:3]) %>%
    summarise(col3=toString(col3))
#  rowname col1 col2 col3
#1     bus  222    G    C
#2     pat  122    A T, G

data

df <- structure(list(rowname = c("pat", "bus", "pat"), col1 = c(122, 
222, 122), col2 = c("A", "G", "A"), col3 = c("T", "C", "G")), 
.Names = c("rowname", 
"col1", "col2", "col3"), row.names = c(NA, -3L), class = "data.frame")

answered Jun 24 '15 at 09:04

akrun

874,273
37
540
662

It seems to work, but it doesn't delete the rows that are already present. Doesn't work as you have shown(with my data!). – MAPK Jun 24 '15 at 09:09
@user277653 The code is based on the example you showed. If you data is different, you may need to show an example that mimics your original data – akrun Jun 24 '15 at 09:10
> library(dplyr) > df %>% + group_by_(.dots=names(df)[1:3]) %>% + summarise(col3=toString(col3)) my Result: col3 1 T, C, G – MAPK Jun 24 '15 at 09:18
@user277653 Here, I used `names(df)[1:3]` as the last column i.e. 4th is col3 and we are taking all other columns as the grouping variable. In your original dataset, how many columns are present and what are the grouping variables? – akrun Jun 24 '15 at 09:19
There are altogether 8 columns and the variation is in the 5th column only. – MAPK Jun 24 '15 at 09:21
@user277653 Then use `names(df)[-5]` – akrun Jun 24 '15 at 09:23

merge the rows in R with the same row name concatenating the content in the column

1 Answers1

data