Basically, I have a dataframe, df
Beginning1 Protein2 Protein3 Protein4 Biomarker1
Pathway3 A G NA NA F
Pathway8 A G NA NA E
Pathway9 A G Z H F
Pathway6 A G Z H E
Pathway2 A G D NA F
Pathway5 A G D NA E
Pathway1 A D K NA F
Pathway7 A B C D F
Pathway4 A B C D E
And now I want to consolidate the rows to look like this:
newdf
Beginning1 Protein2 Protein3 Protein4 Biomarker1
Pathway3 A G NA NA F, E
Pathway9 A G Z H F, E
Pathway2 A G D NA F, E
Pathway1 A D K NA F
Pathway4 A B C D F, E
This is a continuation of a past question I asked (Consolidating duplicate rows in a dataframe). That works for this dataset, but for my much larger dataset it does not seem to combine the values. For example, the first few lines of output (after I modified the code given by @Matt Jewett or used the explanations provided in Concatenate strings by group with dplyr):
Beginning1 Protein2 Protein3 Protein4 Biomarker1
Pathway1 Smoothened Gl-1 Osteopontin
Pathway2 Smoothened Gl-1 BMP2 Osteopontin
Pathway3 Smoothened Gl-1 BMP2 DLX5
Pathway4 Smoothened Gl-1 BMP2 Osteopontin
As you can see, there are several problems. First, the Biomarker1 column doesn't seem to be aggregating. And secondly, there are repeats of several rows. I have hit a wall in terms of solutions, so any solutions you guys can think up would be much appreciated!
Thank you so much for your help!