0

I am trying to sort a data frame by codes contained in one column.

The logic behind these code is:

S/number/number/number/digit (e.g. S120B). The numbers are not always 3 (e.g. S10K) and the letters are not always present (e.g. S2).

The first code is S1, and the list goes until S999, where it turns to S1A. Then it goes to S999A and then turns to S1B, and so on.

Furthermore, there are also codes inside thare are totally different, as W23, E100, etc that should go together.

How can I order the dataframe according to this pretty sick ordering scheme?

MWE: codes <- c(S1, S20D, S550C, S88A, S420K, E44, W22)

Mollan
  • 135
  • 1
  • 8

2 Answers2

0

1.Create minimal reproducible example ;)

mre <- data.frame(ID = c("S1", "S20D", "S550C", "S88A", "S420K", "E44", "W22"),
                stringsAsFactors = FALSE)

Now, I am not sure what you mean by:

Furthermore, there are also codes inside thare are totally different, as W23, E100, etc that should go together.

If you mean that "W23" should be read an sorted totally different than "S999" we need some additional information on how to distinguish between the two cases. Otherwise this should work:

2.Suggested solution alphabetical sorting:

library(dplyr)
mre %>% 
  arrange(ID)


     ID
1   E44
2    S1
3  S20D
4 S420K
5 S550C
6  S88A
7   W22

Or using only base R:

mre[order(mre$ID),]
dario
  • 6,415
  • 2
  • 12
  • 26
0

Following your directions, this is a customized function:

codes <- c("S1", "S20D", "E44", "S550C", "S88A", "S420K", "W22")
complex_order <- function(codes) {
  # Create empty order vector
  final_order <- rep(NA,length(codes))
  # First into account codes that do not match the S convention
  not_in_convention <- !tolower(substr(codes,1,1)) == "s"
  final_order[(length(codes)-sum(not_in_convention)+1):length(codes)] <- which(not_in_convention)

  # Then check the ones that has a letter at the end
  letter_at_end <- tolower(substr(codes,nchar(codes),nchar(codes))) %in% letters & !not_in_convention
  for (idx in which(letter_at_end)) {
    lettr <- tolower(substr(codes[idx],nchar(codes[idx]),nchar(codes[idx])))
    lettr_value <- which(lettr == letters) * 1000 # Every letter means 1000 positions ahead
    codes[idx] <- paste0("S",as.character(lettr_value))
  }

  # Now that we have all in the same code, order the values
  values <- as.numeric(tolower(substr(codes[!not_in_convention],2,nchar(codes[!not_in_convention]))))
  final_order[order(values)] <- which(!not_in_convention) 
  final_order
}
codes[complex_order(codes)]
[1] "S1"    "S88A"  "S550C" "S20D"  "S420K" "E44"   "W22" 

Hope it helps!

JaiPizGon
  • 476
  • 2
  • 8