df
Beginning1 Protein2 Protein3 Protein4 Biomarker1
Pathway3 A G NA NA F
Pathway6 A G NA NA E
Pathway1 A B C D F
Pathway2 A B H NA F
Pathway4 A B C D E
Pathway5 A B H NA F
I would like to re-order the above dataframe (df) so that the pathways that share the greatest similarity in their proteins pathways (aka the greatest similarity in columns 2:4) are sorted next to each other.
To be more clear, I would like the output to look like this:
newdf
Beginning1 Protein2 Protein3 Protein4 Biomarker1
Pathway6 A G NA NA E
Pathway3 A G NA NA F
Pathway5 A B H NA E
Pathway2 A B H NA F
Pathway4 A B C D E
Pathway1 A B C D F
How would one go about doing that? I've tried variations including unique(df), but none have worked so far.
Also, while just ordering by the amount of non-NA characters would work for this dataset, the actual dataset I will be analyzing will have hundreds of pathways with the same amount of steps.