1

I have a list of data tags as strings, much like this:

data <- c("ABCD 2", "ABCD 3", "WXYZ 1", "WXYZ 5", "WXYZ 3", "WXYZ 4", "ABCD 4", "ABCD 11")

Note that some numbers, including "1", are sometimes missing. A normal sort, of course, puts the ABCD tags before the WXYZ tags, and then puts ABCD 11 before ABCD 2.

I can easily overcome the numbering issue with gtools::mixedsort. But, for reasons of problem-specific context, I also want the WXYZ tags to come before the ABCD ones.

For example, when data above is sorted as I need it, it should look like this:

dataSorted <- c("WXYZ 1", "WXYZ 3", "WXYZ 4", "WXYZ 5", "ABCD 2", "ABCD 3", "ABCD 4", "ABCD 11")

Thankfully, I only need to deal with those two types of tags now, but I figure I should ask for a general solution. Is there a way to make gtools::mixedsort do reverse alpha but normal numeric ordering? If I set decreasing = TRUE then it also reverses all the number orders.

Right now I am just using a list to force the order, and that is not only inelegant, but since the numbers on the tags have no theoretical upper limit, it is also going to eventually break.

The Count
  • 183
  • 10

2 Answers2

1

We may extract the digits and non-digits separately, and then do the order after converting to factor with levels specified for the non-digits part

data[order(factor(sub("\\s+\\d+", "", data), 
   levels = c("WXYZ", "ABCD")), as.integer(sub("\\S+\\s+", "", data)))]

-output

[1] "WXYZ 1"  "WXYZ 3"  "WXYZ 4"  "WXYZ 5" 
[5] "ABCD 2"  "ABCD 3"  "ABCD 4"  "ABCD 11"
akrun
  • 874,273
  • 37
  • 540
  • 662
  • 1
    This looks perfect. I have a wait a little while to try and apply it (many cooks in the kitchen), but I will accept it if it works! Thank you! – The Count Nov 19 '21 at 17:29
1

This works without any pre-definitions or manually entered data. Only prerequisite is the first item has to be a letter-string and the second a number (introduces NA if number is missing).

First, split the strings by space, followed by a grouping by letters and a sort of the numbers within the group. Then both have to be brought back together.

# split
dat <- setNames( data.frame( t(data.frame( strsplit( data, " " ) )[1,]),
  as.numeric( data.frame( strsplit( data, " " ) )[2,]) ), c("A","B") )
#                   A  B
#c..ABCD....2..  ABCD  2
#c..ABCD....3..  ABCD  3
#c..WXYZ....1..  WXYZ  1
#c..WXYZ....5..  WXYZ  5
#c..WXYZ....3..  WXYZ  3
#c..WXYZ....4..  WXYZ  4
#c..ABCD....4..  ABCD  4
#c..ABCD....11.. ABCD 11

# group and order
dat_agr <- aggregate( B ~ A, dat, na.action=na.pass, 
  function(x)sort(x, na.last=T), simplify=F )
dat_ord <- dat_agr[order(dat_agr[,"A"], decreasing=T),]
#     A           B
#2 WXYZ  1, 3, 4, 5
#1 ABCD 2, 3, 4, 11

# bring back together
unlist(lapply( dat_ord$A, function(x) sapply( 
  dat_ord[grep(x, dat_ord$A),"B"], function(y) paste(x,y) ) ))
[1] "WXYZ 1"  "WXYZ 3"  "WXYZ 4"  "WXYZ 5"  "ABCD 2"  "ABCD 3"  "ABCD 4" 
[8] "ABCD 11"
Andre Wildberg
  • 12,344
  • 3
  • 12
  • 29