1

I need to figure out how to sort the strings in the first column in alphabetical order in R.

Example of input

AAAA    A.A 0           0
ABAB    A.B 0.046372    2.59202
ACAC    A.C 0.108911    2.71921
ADAD    A.D 0.054307    3.620057
AEAE    A.E 0.042022    2.534175
AFAF    A.F 0.043243    0.212713
AGAG    A.G 0.046828    0.162782
AHAH    A.H 0.02122 2.169073
AIAI    A.I 0.038536    1.960308
AJAJ    A.J 0.034669    1.954065
AOAO    A.O 1.047243    3.799053
BABA    B.A 0.046372    2.59202
BBBB    B.B 0           0
BCBC    B.C 0.100474    1.802687
BDBD    B.D 0.051434    2.328003
BEBE    B.E 0.041227    2.075464
BFBF    B.F 0.039445    2.518254
BGBG    B.G 0.019662    2.758563
BHBH    B.H 0.050746    3.54119
BIBI    B.I 0.033351    3.502687
BJBJ    B.J 0.045264    3.407983
BOBO    B.O 1.005559    -1

Needed output:

AAAA    A.A 0   0
AABB    A.B 0.046372    2.59202
AABB    B.A 0.046372    2.59202

etc.

M--
  • 25,431
  • 8
  • 61
  • 93
amonsie
  • 15
  • 4
  • 4
    Do you mean actually sorting the characters in each string? I think https://stackoverflow.com/questions/5904797/how-to-sort-letters-in-a-string might be helpful if so. – thelatemail Dec 04 '19 at 22:17
  • 1
    Does this answer your question? [How to sort a dataframe by multiple column(s)](https://stackoverflow.com/questions/1296646/how-to-sort-a-dataframe-by-multiple-columns) – M-- Dec 04 '19 at 22:32
  • Yes sorting the characters in each string and then organizing the data set by the first column. (This was originally a symmetrical matrix in which the values were duplicated) – amonsie Dec 04 '19 at 22:49
  • @amonsie, I edited my answer, let me know if it is what you are looking for – dc37 Dec 04 '19 at 22:50

1 Answers1

1

You can use order or sort to order alphabetically a character vector (see @tim's comment for the difference between both). For example:

vec <- c("AAAA","DEFA","AAAB","CBDA","AAAC","DEFG")
vec[order(vec)]

[1] "AAAA" "AAAB" "AAAC" "CBDA" "DEFA" "DEFG"

Then, if you want to reorder letters inside a letter sequence, you need to split them and order them. This can be achieve by doing (Thanks to @H1 for the improvement of the function):

vec <- sapply(vec, function(v) {paste0(sort(unlist(strsplit(v,""))),collapse = "")})

> vec
  AAAA   DEFA   AAAB   CBDA   AAAC   DEFG 
"AAAA" "ADEF" "AAAB" "ABCD" "AAAC" "DEFG" 

And if you want to combine both, you need to do:

vec <- sort(sapply(vec, function(v) {paste0(sort(unlist(strsplit(v,""))),collapse = "")}))

> vec
[1] "AAAA" "AAAB" "AAAC" "ABCD" "ADEF" "DEFG"

So, in your example, if your dataframe is called df, you should do:

df <- data.frame(vec = c("AAAA","DEFA","AAAB","CBDA","AAAC","DEFG"),
                numb = c(1,2,3,4,5,6))
df[,1] <- as.character(df[,1])

df[,1] <- sapply(df[,1], function(v) { paste0(sort(unlist(strsplit(v,""))),collapse = "")})
df <- df[order(df[,1]),]

And you get:

> df
   vec numb
1 AAAA    1
3 AAAB    3
5 AAAC    5
4 ABCD    4
2 ADEF    2
6 DEFG    6
dc37
  • 15,840
  • 4
  • 15
  • 32
  • Yep; so didn't get it even at second look. :) – markus Dec 04 '19 at 22:25
  • In the expected output, the third line expected is actually "BABA" reorder as "AABB". If I understood right, he want a double ordering – dc37 Dec 04 '19 at 22:27
  • 1
    @dc37 @markus `sort()` returns a vector in the sorted order. `order()` returns the indices of the vector in sorted order. `rank()` returns a vector of ranks if it were to sort. – tim Dec 04 '19 at 22:29
  • @tim But OP is not looking for `sort` here. Well, not only. Take a look at the third row of the expected output for example. – markus Dec 04 '19 at 22:31
  • Hm, I don't quite get what OP's looking for then. The question I was responding to in my answer was "sort the strings in the first column in alphabetical order". – tim Dec 04 '19 at 22:39
  • 1
    @tim, if you look for the expected output, the OP wants to sort both the strings in the first columns and letters in each string. – dc37 Dec 04 '19 at 22:41