1

Thank you in advance for any help with this issue.

I am currently conducting an analysis where we are looking at pairwise differences between insect groups. However I have run into a frustrating issue that I can't seem to fix without hundreds of lines of code.

For example, we have pairwise comparisons as characters, which were constructed from two columns group1 and group2 using paste but this results in mirrored groups i.e. A_B and B_A.

Does anyone know a solution so we can make these both bee_beetle? Or rather a different function to make our pairwise groups.

Here is a quick example...

df=cbind.data.frame(c("A","B","C","D"),c("B","A","D","C")) colnames(df)=c("Group1","Group2") paste(df$Group1,df$Group2,sep="_")

"A_B" "B_A" "C_D" "D_C"

But I would like "A_B","A_B","C_D","C_D" irrespective of which group (1 or 2)

We have about 400 odd groupings that we need to normalise.

Thanks again

Liam

  • Laim, welcome to SO! This question is too vague to get a meaningful answer. Please read about how to ask good questions, especially parts about sample data, code you've tried (and why it doesn't work), and what is your expected output. Refs: https://stackoverflow.com/questions/5963269, https://stackoverflow.com/help/mcve, and https://stackoverflow.com/tags/r/info. – r2evans Nov 07 '18 at 01:57
  • 1
    Apologises, I have added a quick example to make the question better! – Liam Kendall Nov 07 '18 at 02:19

1 Answers1

1

You've given very little information but if you would like to make sure beetle_bee is bee_beetle, you can try having an alphabetical sort after you split the string by the underscore, and then reassemble.

paste(sort(unlist(strsplit(x = "beetle_bee",split = "_"))),collapse="_")
#[1] "bee_beetle"

strsplit will split "beetle_bee" into a list where the first list contains "beetle","bee". To sort that alphabetically, I remove the list with unlist, and then sort. I then paste collapse the result back together.

EDIT:

df=cbind.data.frame(c("A","B","C","D"),c("B","A","D","C"))
colnames(df)=c("Group1","Group2")
apply(df[,c("Group1","Group2")], MARGIN = 1, function(x){
  paste(sort(x), collapse = "_")
})
#[1] "A_B" "A_B" "C_D" "C_D"
Evan Friedland
  • 3,062
  • 1
  • 11
  • 25