0

Here is the original data frame, and the objective data frame:

dataframeA <- data.frame(complex_id = c(1,1,2,2,3,3),
                complex_name = c("BCL6-HDAC4 complex - Human",                                                
                                 "BCL6-HDAC4 complex - Human",                                                
                                 "BCL6-HDAC5 complex - Human",                                                
                                 "BCL6-HDAC5 complex - Human",                                                
                                 "BCL6-HDAC7 complex - Human",                                                
                                 "BCL6-HDAC7 complex - Human"),
                protein_id = c("P41182",
                               "P56524",
                               "P41182",
                               "Q9UQL6",
                               "P41182",
                               "Q8WUI4"))

dataframeB <- data.frame(complex_id = c(1,1,"1;2;3",3),
                complex_name = c("BCL6-HDAC5 complex - Human",                                                
                                 "BCL6-HDAC5 complex - Human",                                                
                                 "BCL6-HDAC7 complex - Human",                                                
                                 "BCL6-HDAC7 complex - Human"),
                protein_id = c("P56524",
                               "Q9UQL6",
                               "P41182",
                               "Q8WUI4"))

How to convert dataframeA to dataframeB?

Mengge Lyu
  • 23
  • 2
  • 1
    Could you clarify the rules for merging? What happened to "BCL6-HDAC4" ? – zx8754 May 16 '23 at 08:47
  • Oh! I forgot to write the merge rule. If the contents in "complex_name" column and "protein_id" column are both the same, so merging the number in "complex_id" and separate by ";" – Mengge Lyu May 16 '23 at 09:01
  • P41182 is shared by 3 complex_names, I understand the id of 1;2;3. But what happened to `BCL6-HDAC4`? – zx8754 May 16 '23 at 09:04
  • I think you need "group by paste", see this and linked posts: https://stackoverflow.com/q/22756372/680068 , try `aggregate(dataframeA[ "complex_id" ], dataframeA[ "protein_id" ], FUN = toString)` – zx8754 May 16 '23 at 09:07
  • Delete the duplicated "protein_id" column if they had the same "complex_name", and keep the remaining "complex_name" column after the "protein_id" depletion. – Mengge Lyu May 16 '23 at 11:42

0 Answers0