1

I am working with Proteomic data and testing differences between versions of the analysis software. We are wanting to have a table that lets us know in what versions of the software the proteins appear.

Below is a simplified version of the data table I currently have:

Version Protein.ID Protein name
1.1     A          name 1
1.2     A          name 1
1.1     B          name 2
1.2     B          name 2

I want my table to look like this:

Version   Protein.ID Protein name
1.1, 1.2  A          name 1
1.1, 1.2  B          name 2

I have been looking for 2 days on here and the web and can not find a solution.

I have tried using spread, and aggregate but neither worked. I either got a huge number of columns or a single column lacking the information I was after. I tried using some base R commands like paste but could not get rid of duplicate values.

Example of something I tried:

allver.mergeVerID <- spread(allver.ids, Protein.ID, Ver.ID.Porder)
Error: Each row of output must be identified by a unique combination of keys. 
Keys are shared for 5311 rows:

I also get this error using

allver.mergeVerID <- allver.ids %>% group_by(Protein.ID) %>% 
  summarise(Ver.ID.Porder= toString(Ver.ID.Porder), )

OR

allver.mergeVerID <- aggregate(Ver.ID.Porder ~ Protein.ID, allver.ids, toString)

What does this error mean?

camille
  • 16,432
  • 18
  • 38
  • 60

1 Answers1

0

Here is one way. After grouping by 'Protein.ID', summarise the 'Version' by pasteing the elements together

library(dplyr)
df1 %>%
  group_by(Protein.ID, `Protein name`) %>%
  summarise(Version = toString(Version))

Or with aggregate from base R

aggregate(Version ~ Protein.ID + `Protein name`, df1, toString)
#  Protein.ID Protein name  Version
#1          A       name 1 1.1, 1.2
#2          B       name 2 1.1, 1.2

NOTE: Both solutions match the expected output

data

df1 <- data.frame(Version = c(1.1, 1.2, 1.1, 1.2),
     Protein.ID = c('A', 'A', 'B', 'B'), `Protein name` = c('name 1', 
  'name 1', 'name 2', 'name 2'), check.names = FALSE, stringsAsFactors = FALSE)
Community
  • 1
  • 1
akrun
  • 874,273
  • 37
  • 540
  • 662