0

I'm trying to get this order:

ID
Q1                  
Q2              
Q3              
Q4
Q10 

But I'm getting this order with dplyr::arrange:

Q1  
Q10             
Q2              
Q3              
Q4

Reproducible Example:

df <- tribble(~ID, 
        "Q1",
        "Q2",
        "Q3",
        "Q4",
        "Q10")
df %>% 
  arrange(ID)
M--
  • 25,431
  • 8
  • 61
  • 93
writer_typer
  • 708
  • 7
  • 25
  • 1
    since you have `tidyverse`, use `str_sort(x, numeric = TRUE)` – Onyambu Jul 07 '22 at 19:47
  • The provided answers below (and the linked potential duplicate) doesn't address the situation where there is more than one column (which is implicitly the question here as I understand it). In such case a possible solution is to convert the ID column into a factor: `df |> mutate(ID = factor(ID, str_sort(unique(ID), numeric = TRUE))) |> arrange(ID)`. In other words, I believe the question should be reopened. – harre Jul 07 '22 at 20:00
  • I agree, but I wasn't surprised that the question was closed knowing how stack overflow works. – writer_typer Jul 07 '22 at 20:10
  • The `str_sort` option doesn't seem to work in the dataframe on it's own. Perhaps it needs to be part of another approach to work. – writer_typer Jul 07 '22 at 20:14

2 Answers2

2

A possible solution, based on gtools::mixedsort:

library(dplyr)

df %>% 
  mutate(ID = gtools::mixedsort(ID))

#> # A tibble: 5 × 1
#>   ID   
#>   <chr>
#> 1 Q1   
#> 2 Q2   
#> 3 Q3   
#> 4 Q4   
#> 5 Q10

Or without using other library beyond dplyr:

library(dplyr)

df %>% 
  arrange(as.numeric(gsub("\\D+", "", ID)))

#> # A tibble: 5 × 1
#>   ID   
#>   <chr>
#> 1 Q1   
#> 2 Q2   
#> 3 Q3   
#> 4 Q4   
#> 5 Q10
PaulS
  • 21,159
  • 2
  • 9
  • 26
1

Another option, dplyr/base only. This relies on the assumption that there is only one such "number" within each value, and it is not interrupted. (For instance, "A1B2C" will sort here as 12.)

dat %>%
  arrange(suppressWarnings(as.integer(gsub("\\D", "", ID))))
#    ID
# 1  Q1
# 2  Q2
# 3  Q3
# 4  Q4
# 5 Q10

If you prefer to only use the first (if multiple) number, then we can use readr::parse_number:

dat %>%
  arrange(suppressWarnings(readr::parse_number(ID)))
#    ID
# 1  Q1
# 2  Q2
# 3  Q3
# 4  Q4
# 5 Q10

I continue to suppressWarnings because any field with no numbers will be noisy, as in

readr::parse_number("Q")
# Warning: 1 parsing failure.
# row col expected actual
#   1  -- a number      Q
# [1] NA
# attr(,"problems")
# # A tibble: 1 x 4
#     row   col expected actual
#   <int> <int> <chr>    <chr> 
# 1     1    NA a number Q     

For the sake of this example, we don't care about the "problems" attribute attached to the output, and having it as NA is acceptable.


Data

dat <- structure(list(ID = c("Q2", "Q3", "Q4", "Q10", "Q1")), row.names = c(2L, 3L, 4L, 5L, 1L), class = "data.frame")
r2evans
  • 141,215
  • 6
  • 77
  • 149
  • doest the function `str_sort(x, numeric =TRUE)` solve the issue? – Onyambu Jul 07 '22 at 19:58
  • No, it returns `c("Q2", "Q3", "Q4", "Q10", "Q1")` – r2evans Jul 08 '22 at 13:05
  • It clearly gives `"Q1" "Q2" "Q3" "Q4" "Q10"` as the result and not what you just stated – Onyambu Jul 08 '22 at 13:08
  • `str_sort(dat)` (wrong) returns `[1] "c(\"Q2\", \"Q3\", \"Q4\", \"Q10\", \"Q1\")"`. `str_sort(dat$ID)` returns `c("Q1", "Q10", "Q2", "Q3", "Q4")`. What do you think I'm doing wrong? Note that the original question had strings already ordered, my `dat` does not (in order to demonstrate sorting it). – r2evans Jul 08 '22 at 13:11
  • `str_sort(dat$ID, numeric = TRUE)` is what I wrote – Onyambu Jul 08 '22 at 13:13
  • Yes, I am using the unordered. or simply include sample. ie `str_sort(sample(dat$ID), numeric = TRUE)` it will always return the correct order – Onyambu Jul 08 '22 at 13:15
  • `stringr::str_sort(dat$ID, numeric=TRUE)` does return `c("Q1", "Q2", "Q3", "Q4", "Q10")`, but `arrange(dat, stringr::str_sort(ID, numeric=TRUE))$ID` returns `c("Q2", "Q1", "Q3", "Q4", "Q10")` and `arrange(dat, stringr::str_order(ID, numeric=TRUE))$ID` returns `c("Q3", "Q4", "Q10", "Q1", "Q2")`. However, `slice(dat, stringr::str_order(ID, numeric=TRUE))` works. – r2evans Jul 08 '22 at 13:17