1

I need help merging the rows with the same name by concatenating the content in one of the columns. For example, in my dataframe,df, the rows with the same name match completely across the columns except in col 3. I want to merge the rows with the same rowname and concatenate the contents in col3 separated by a comma and get the result as shown below. Thank you for your help.

df

rowname  col1    col2  col3 
pat      122       A      T
bus      222       G      C
pat      122       A      G


result

rowname  col1    col2  col3 
pat      122       A      T,G
bus      222       G      C
MAPK
  • 5,635
  • 4
  • 37
  • 88
  • I mean I just want to merge all the column as they are same if the rowname matches . Only the contents in col3 differ. – MAPK Jun 24 '15 at 09:00
  • Sorry I just saw that there is a typo in col1. correction: 122 is in both 1 and third rows of col1 . Thanks! – MAPK Jun 24 '15 at 09:04
  • Can you update with an example using `dput` i..e `dput(droplevels(head(yourdata,20)))` Copy/paste the output of the dput in your post – akrun Jun 24 '15 at 09:18
  • Did that worked (`names(df)[-5]`)? – akrun Jun 24 '15 at 09:27

1 Answers1

2

Try

aggregate(col3~., df, FUN=toString)
#   rowname col1 col2 col3
#1     pat  122    A T, G
#2     bus  222    G    C

Or using dplyr

library(dplyr)
 df %>%
    group_by_(.dots=names(df)[1:3]) %>%
    summarise(col3=toString(col3))
#  rowname col1 col2 col3
#1     bus  222    G    C
#2     pat  122    A T, G

data

df <- structure(list(rowname = c("pat", "bus", "pat"), col1 = c(122, 
222, 122), col2 = c("A", "G", "A"), col3 = c("T", "C", "G")), 
.Names = c("rowname", 
"col1", "col2", "col3"), row.names = c(NA, -3L), class = "data.frame")
akrun
  • 874,273
  • 37
  • 540
  • 662
  • It seems to work, but it doesn't delete the rows that are already present. Doesn't work as you have shown(with my data!). – MAPK Jun 24 '15 at 09:09
  • @user277653 The code is based on the example you showed. If you data is different, you may need to show an example that mimics your original data – akrun Jun 24 '15 at 09:10
  • > library(dplyr) > df %>% + group_by_(.dots=names(df)[1:3]) %>% + summarise(col3=toString(col3)) my Result: col3 1 T, C, G – MAPK Jun 24 '15 at 09:18
  • @user277653 Here, I used `names(df)[1:3]` as the last column i.e. 4th is col3 and we are taking all other columns as the grouping variable. In your original dataset, how many columns are present and what are the grouping variables? – akrun Jun 24 '15 at 09:19
  • There are altogether 8 columns and the variation is in the 5th column only. – MAPK Jun 24 '15 at 09:21
  • @user277653 Then use `names(df)[-5]` – akrun Jun 24 '15 at 09:23