1

I have character vector of the following shape:

fld <- c('20*20', '100*100', '200*200', '50*50', '1000*1000', '250*250')

I need to sort elements according to a value of number before the star.

sort(fld) gives:

[1] "100*100" "1000*1000" "20*20" "200*200" "250*250" "50*50"

instead of desirable:

[1] "20*20" "50*50" "100*100" "200*200" "250*250" "1000*1000"

I've prepared the following expression which does thing right:

fld[
  charmatch(  
    paste(
      as.character(sort(as.integer( 
        gsub('\\*.{2,4}', '', fld)
      ))),
      '*', sep = ''
    ),
    fld)
  ]

but I bet that there is shorter / easier / more natural way...

Pawel
  • 401
  • 6
  • 17
  • Because there is no more suggestion that this is duplicated question, I thought that it could be interesting for someone to know that there was also a question with more general and also good answer: https://stackoverflow.com/questions/2778039/how-to-perform-natural-sorting – Pawel Jun 02 '17 at 07:00

2 Answers2

4

A base R approach:

fld[order(as.numeric(sub("\\*.*", "", fld)))]
#[1] "20*20"     "50*50"     "100*100"   "200*200"   "250*250"   "1000*1000"

This deletes the * and whatever follows it in each element of fld, turns the resulting part to numeric and computes the order. This is used to index/order the original vector.

Just for good measure, here's another way of extracting the first parts of the vector (numbers only):

fld[order(as.numeric(sub("^(\\d+)(.*)", "\\1", fld)))]
#[1] "20*20"     "50*50"     "100*100"   "200*200"   "250*250"   "1000*1000"
talat
  • 68,970
  • 21
  • 126
  • 157
  • @akrun Delating your answer you've made my acceptation much simpler :) You're right that a level of generality of the question was not fully clear. In fact I will benefit a lot knowing `gtools::mixedsort()`, so thank you. – Pawel Jun 01 '17 at 10:53
1

We can use parse_number from readr. The parse_number will extract the numbers before the *, order to get the index, and then use it to order the original vector

library(readr)
fld[order(parse_number(fld))]
#[1] "20*20"     "50*50"     "100*100"   "200*200"   "250*250"   "1000*1000"

Or a more efficient approach is to extract the numeric part using stri_extract_first from stringi, convert to numeric, order the original string based on this

library(stringi)
fld[order(as.integer(stri_extract_first_regex(fld, "[0-9]+")))]
#[1] "20*20"     "50*50"     "100*100"   "200*200"   "250*250"   "1000*1000"
akrun
  • 874,273
  • 37
  • 540
  • 662