I've tried to do the following starting from this data frame
Chr Gene.Symbols
2 chr1 GSTM1
3 chr2 MIR4432
4 chr2 BCL11A
5 chr2 PAPOLG
6 chr2 LINC01185
7 chr2 LINC01185
8 chr2 LINC01185, REL
9 chr2 REL
10 chr2 REL
11 chr2 REL
12 chr2 REL
13 chr2
14 chr2 PUS10
15 chr2 PEX13, KIAA1841
I would like to have this result:
Chr Gene.Symbols
2 chr1 GSTM1
3 chr2 MIR4432,BCL11A,PAPOLG,LINC01185,REL,PUS10,PEX13,KIAA1841
I've managed to aggregate the gene symbols together using:
aggregate(Gene.Symbols~Chr, data, paste, collapse = ",")
that I learned from other questions like this one, but I wasn't able to remove duplicates.
Can someone help me, please?
UPDATE: I also need a file with only the genes names one per row (without the "Chr" column). How can I traspose the gene names? I am starting now with a file with as many rows as Chr and each row one has several genes in the Gene.Symbols column.