31

Is there a natural sort for R?

Say I had a character vector like so:

seq.names <- c('abc21', 'abc2', 'abc1', 'abc01', 'abc4', 'abc201', '1b', '1a')

I'd like to sort it aphanumerically, so I get back this:

c('1a', '1b', 'abc1', 'abc01', 'abc2', 'abc4', 'abc21', 'abc201')

Does this exist somewhere, or should I start coding?

Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294
cbare
  • 12,060
  • 8
  • 56
  • 63

2 Answers2

48

I don't think "alphanumeric sort" means what you think it means.

In any case, looks like you want mixedsort, part of gtools.

> install.packages('gtools')
[...]
> require('gtools')
Loading required package: gtools
> n
[1] "abc21"  "abc2"   "abc1"   "abc01"  "abc4"   "abc201" "1b"     "1a"    
> mixedsort(n)
[1] "1a"     "1b"     "abc1"   "abc01"  "abc2"   "abc4"   "abc21"  "abc201"
Nicholas Riley
  • 43,532
  • 6
  • 101
  • 124
  • Excellent! Is alphanumeric sort not this right term for this? Have I been calling it the wrong thing all along? – cbare May 06 '10 at 02:35
  • 1
    Alphanumeric sort would like that what is returned from the R sort() function. Each character is evaluated based on ASCII value of the position. Smaller values are sorted first. In this case, "abc01" would be before "abc1" because ASCII value "0" (48) is smaller than "1" (49) for position 4. – beach May 06 '10 at 02:53
  • 6
    I've typically used the term "natural order sort" after one of the first widely used pieces of software to do this (http://www.naturalordersort.org/). Jeff Atwood even wrote an blog post about it (http://www.codinghorror.com/blog/2007/12/sorting-for-humans-natural-sort-order.html). – Nicholas Riley May 06 '10 at 02:59
16

Natural sorting is available in the stringr/stringi packages with the functions str_sort()/stri_sort(). Switching between alphanumeric and natural sorting is controlled by the 'numeric' argument.

library(stringr)
# library(stringi)

str_sort(seq.names, numeric = TRUE)
# stri_sort(seq.names, numeric = TRUE)

[1] "1a"     "1b"     "abc1"   "abc01"  "abc2"   "abc4"   "abc21"  "abc201"

The companion function str_order() / stri_order() returns the indices to arrange the vector in (by default) ascending order:

str_order(seq.names, numeric = TRUE)
# stri_order(seq.names, numeric = TRUE)

[1] 8 7 3 4 2 5 1 6

seq.names[str_order(seq.names, numeric = TRUE)]

[1] "1a"     "1b"     "abc1"   "abc01"  "abc2"   "abc4"   "abc21"  "abc201"
Ritchie Sacramento
  • 29,890
  • 4
  • 48
  • 56