I've got data describing genes where I've got genes in duplicate. For those with duplicates I'd like to compress the information so no information is lost and all duplicate gene info combines into one row. I've seen similar questions (like How to combine duplicate rows in a data frame in R) but this is selecting the largest duplicate number, haven't found questions that generally keep duplicate info into one row.
For example I have data like this:
gene pvalue info
ACE 0.7 benign
ACE 0.001 pathogenic
ACE 0.5 benign
BRCA 0.01 benign
NOS 0.2 benign
NOS 0.003 pathogenic
NOS 0.57 benign
I want the duplicates to combine/compress into
gene pvalue info
ACE 0.7, 0.001, 0.5 benign, pathogenic,benign
BRCA 0.01 benign
NOS 0.2, 0.003, 0.57 benign, pathogenic, benign
The aim is after compression I will code for within numeric cells to select either the largest or smallest number for that gene.
Currently for compressing duplicate gene information I've tried using aggregate()
but this requires a setting of FUN that I don't want to do and I don't know how to get around.