1

I have a problem with some R Code in terms of sports data over a number of years not being in a very logical order. I have a dataset with 42 variables and almost 80,000 cases, and one is paraphrased below:

dat <- c(2020, 2020, 2020, 2020, 2020, 2020, 2020)
r<- c("QF", "R1", "R15", "R2", "R25", "R3", "SF")
data <- data.frame(dat, r)

Obiously each case will have one of the round details, not all of them, and not only having 26 cases

The problem is that rather than ordering it in the above order of R1-R25, followed by QF, SF and GF, it is ordered in a manner of GF, QF, R1, R10-R19, R2, R21-R25, R3-R9, SF, obviously due to the numerical order of the first digit after the R, and letter order of each thing.

This is how i want it to look, but I cant go through 80,000 cases manuall like this:

dat <- c(2020, 2020, 2020, 2020, 2020, 2020, 2020)
r <- c("R1", "R2", "R3", "R15", "R25", "R3", "QF", "SF")
data <- data.frame(dat, r)

Thanks :)

  • [See here](https://stackoverflow.com/q/5963269/5325862) on making a reproducible example that is easier for folks to help with. It's hard to do any more than guess based on what you've posted, but you probably just need to make a factor – camille Oct 09 '21 at 00:25
  • edited... hopefully that helps camille – user1249891235907 Oct 09 '21 at 00:32
  • Does this answer your question? [How to sort a character vector where elements contain letters and numbers in R?](https://stackoverflow.com/questions/17531403/how-to-sort-a-character-vector-where-elements-contain-letters-and-numbers-in-r) – camille Oct 09 '21 at 00:59

3 Answers3

1

Here's a tidyverse solution:

library(tidyverse)

data %>% 
  mutate(r = str_sort(r, numeric = T))

Edit:

To arrange as "R, Q, S", you can substring your r variable and apply a custom sort using arrange and match:

data %>% 
  mutate(r = str_sort(r, numeric = T)) %>% 
  arrange(match(str_sub(r,1,1), c("R", "Q", "S"))) 

This gives us:

   dat   r
1 2020  R1
2 2020  R2
3 2020  R3
4 2020 R15
5 2020 R25
6 2020  QF
7 2020  SF
Matt
  • 7,255
  • 2
  • 12
  • 34
1

Since you want "QF" and "SF" at the end one option would be to extract the number from the r column and order them. "QF" and "SF" don't have numeric value in them so they would return NA and will ordered last.

result <- data[order(as.numeric(stringr::str_extract(data$r, '\\d+'))), ]

#   dat   r
#2 2020  R1
#4 2020  R2
#6 2020  R3
#3 2020 R15
#5 2020 R25
#1 2020  QF
#7 2020  SF
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
1

We may use parse_number

library(dplyr)
data %>% 
   arrange(readr::parse_number(r))

-output

 dat   r
1 2020  R1
2 2020  R2
3 2020  R3
4 2020 R15
5 2020 R25
6 2020  QF
7 2020  SF
akrun
  • 874,273
  • 37
  • 540
  • 662