Basically, I have a dataframe, df
Beginning1 Protein2 Protein3 Protein4 Biomarker1
Pathway3 A G NA NA F
Pathway8 Z G NA NA E
Pathway9 A G Z H F
Pathway6 Y G Z H E
Pathway2 A G D NA F
Pathway5 Q G D NA E
Pathway1 A D K NA F
Pathway7 A B C D F
Pathway4 V B C D E
And I want to combine the dataframe so that those rows when are identical from "Protein2" to "Protein4" are condense, giving the following:
Beginning1 Protein2 Protein3 Protein4 Biomarker1
Pathway3 A,Z G NA NA F,E
Pathway9 A,Y G Z H F,E
Pathway2 A,Q G D NA F,E
Pathway1 A D K NA F
Pathway7 A,V B C D F,E
This is very similar to a question that I asked before (Consolidating duplicate rows in a dataframe), however the difference is that I am also consolidating the "Beginning1" row.
So far, I have tried:
library(dat.table)
dat<-data.table(df)
Total_collapse <- dat[, .(
Biomarker1 = paste0(Biomarker1, collapse = ", ")),
by = .(Beginning1, Protein1, Protein2, Protein3)]
Total_collapse <- dat[, .(
Beginning1 = paste0(Beginning1, collapse = ", ")),
by = .(Protein1, Protein2, Protein3)]
which gives the output:
Beginning1 Protein2 Protein3 Protein4 Biomarker1
Pathway3 G NA NA F,E
Pathway9 G Z H F,E
Pathway2 G D NA F,E
Pathway1 D K NA F
Pathway7 B C D F,E
Does anyone know how to fix this problem? I have also tried duplicating the solution from Collapse / concatenate / aggregate a column to a single comma separated string within each group, but have had no success.
I am sorry if it is a simple error- I am pretty new to R.