16

How can I order a vector like

c("7","10a","10b","10c","8","9","11c","11b","11a","12") -> alph

in

alph
[1] "7","8","9","10a","10b","10c","11a","11b","11c","12"

and use it to sort a data.frame, like

V1 <- c("A","A","B","B","C","C","D","D","E","E")
V2 <- 2:1 
V3 <- alph
df <- data.frame(V1,V2,V3)

and order the row to obtain (order V2 and then V3)

 V1 V2  V3
C  1   9
A  1 10a
B  1 10c
D  1 11b
E  1  12
A  2   7
C  2   8
B  2 10b
E  2 11a
D  2 11c
nebi
  • 722
  • 1
  • 8
  • 17
  • Do not use `data.frame(cbind(...))`, just use `data.frame(...)` directly. By calling `cbind` you make a *character matrix* containing `V1`, `V2` and `V3`, which probably isn't what you want. – Backlin Dec 05 '13 at 10:03

1 Answers1

29
> library(gtools)
> mixedsort(alph)

[1] "7"   "8"   "9"   "10a" "10b" "10c" "11a" "11b" "11c" "12" 

To sort a data.frame you use mixedorder instead

> mydf <- data.frame(alph, USArrests[seq_along(alph),])
> mydf[mixedorder(mydf$alph),]

            alph Murder Assault UrbanPop Rape
Alabama        7   13.2     236       58 21.2
California     8    9.0     276       91 40.6
Colorado       9    7.9     204       78 38.7
Alaska       10a   10.0     263       48 44.5
Arizona      10b    8.1     294       80 31.0
Arkansas     10c    8.8     190       50 19.5
Florida      11a   15.4     335       80 31.9
Delaware     11b    5.9     238       72 15.8
Connecticut  11c    3.3     110       77 11.1
Georgia       12   17.4     211       60 25.8

mixedorder on multiple vectors (columns)

Apparently mixedorder cannot handle multiple vectors. I have made a function that circumvents this by converting all character vectors to factors with mixedsorted sorted levels, and pass all vectors on to the standard order function.

multi.mixedorder <- function(..., na.last = TRUE, decreasing = FALSE){
    do.call(order, c(
        lapply(list(...), function(l){
            if(is.character(l)){
                factor(l, levels=mixedsort(unique(l)))
            } else {
                l
            }
        }),
        list(na.last = na.last, decreasing = decreasing)
    ))
}

However, in your particular case multi.mixedorder gets you the same result as the standard order, since V2 is numeric.

df <- data.frame(
    V1 = c("A","A","B","B","C","C","D","D","E","E"),
    V2 = 19:10,
    V3 = alph,
    stringsAsFactors = FALSE)

df[multi.mixedorder(df$V2, df$V3),]

   V1 V2  V3
10  E 10  12
9   E 11 11a
8   D 12 11b
7   D 13 11c
6   C 14   9
5   C 15   8
4   B 16 10c
3   B 17 10b
2   A 18 10a
1   A 19   7

Notice that

  • 19:10 is equivalent to c(19:10). c means concat, that is to make one long vector out of many short, but in you case you only have one vector (19:10) so there's no need to concat anything. However, in the case of V1 you have 10 vectors of length 1, so there you need to concat, as you already do.
  • You need stringsAsFactors=FALSE to not convert V1 and V3 to (incorrectly sorted) factors (which is default).
Backlin
  • 14,612
  • 2
  • 49
  • 81
  • I tried this solution but i can't figure out how to use it to sort two columns (I edited an example). – nebi Dec 05 '13 at 10:08
  • It appears `mixedorder` does not support multiple columns (how strange!), but I can hack you a roundabout solution. Been wanting to do a thing of this nature before. – Backlin Dec 05 '13 at 10:18
  • You're right, my example is really bad, sorry. But try your solution with V2 = 2:1 and it's not working anymore... isn't? – nebi Dec 05 '13 at 10:58
  • Don't worry, many things in R are not obvious at first :) And you are correct in that it doesn't work if there are ties in `V2`. I'll take a look at it later today. – Backlin Dec 05 '13 at 11:03
  • There! Works now. The trick was to convert all characters to factors, which are sorted correctly with the standard `order` function. – Backlin Dec 05 '13 at 13:45
  • Great! I made some editing to include all your useful comments. Thanks! – nebi Dec 05 '13 at 20:43
  • this helped me thanks.But ggplot insists in plotting in the unsorted form – Tiago Bruno Oct 26 '15 at 00:12
  • 1
    @TiagoBruno Make sure you are using the newest version of ggplot2 that was recently released. The ordering of things has been a pain in older versions and although I have not tried it in the new version yet I know it has been massively rewritten and improved, so I hope they have sorted this out. – Backlin Oct 28 '15 at 09:41