Making identical mirrored character strings identical in R

Question

Thank you in advance for any help with this issue.

I am currently conducting an analysis where we are looking at pairwise differences between insect groups. However I have run into a frustrating issue that I can't seem to fix without hundreds of lines of code.

For example, we have pairwise comparisons as characters, which were constructed from two columns group1 and group2 using paste but this results in mirrored groups i.e. A_B and B_A.

Does anyone know a solution so we can make these both bee_beetle? Or rather a different function to make our pairwise groups.

Here is a quick example...

df=cbind.data.frame(c("A","B","C","D"),c("B","A","D","C")) colnames(df)=c("Group1","Group2") paste(df$Group1,df$Group2,sep="_")

"A_B" "B_A" "C_D" "D_C"

But I would like "A_B","A_B","C_D","C_D" irrespective of which group (1 or 2)

We have about 400 odd groupings that we need to normalise.

Thanks again

Liam

Laim, welcome to SO! This question is too vague to get a meaningful answer. Please read about how to ask good questions, especially parts about sample data, code you've tried (and why it doesn't work), and what is your expected output. Refs: https://stackoverflow.com/questions/5963269, https://stackoverflow.com/help/mcve, and https://stackoverflow.com/tags/r/info. — r2evans, Nov 07 '18 at 01:57
Apologises, I have added a quick example to make the question better! — Liam Kendall, Nov 07 '18 at 02:19

Evan Friedland · Accepted Answer · 2018-11-07T02:26:53.303

1

You've given very little information but if you would like to make sure beetle_bee is bee_beetle, you can try having an alphabetical sort after you split the string by the underscore, and then reassemble.

paste(sort(unlist(strsplit(x = "beetle_bee",split = "_"))),collapse="_")
#[1] "bee_beetle"

strsplit will split "beetle_bee" into a list where the first list contains "beetle","bee". To sort that alphabetically, I remove the list with unlist, and then sort. I then paste collapse the result back together.

EDIT:

df=cbind.data.frame(c("A","B","C","D"),c("B","A","D","C"))
colnames(df)=c("Group1","Group2")
apply(df[,c("Group1","Group2")], MARGIN = 1, function(x){
  paste(sort(x), collapse = "_")
})
#[1] "A_B" "A_B" "C_D" "C_D"

edited Nov 07 '18 at 02:26

answered Nov 07 '18 at 02:05

Evan Friedland

3,062
1
11
25

Thank you very much for response Evan, however we need to do this across a few hundred groups but will use this solution if I need to :) – Liam Kendall Nov 07 '18 at 02:20
Please see my edit now that you have provided more clarity in your question. – Evan Friedland Nov 07 '18 at 02:27
It now uses the apply function with MARGIN = 1, meaning, for each row, sort and combine the groups. – Evan Friedland Nov 07 '18 at 02:29

Making identical mirrored character strings identical in R

1 Answers1