0

I would like to sort a df like this one:

library(dplyr)
tags <- c("F23-F45", "A69-1008-1", "A69-1008-10", "A69-1008-10", "A69-1008-100", "A69-1008-12")
animal_names <- c("Dim","Dami", "Pet", "Nic", "Li", "Tan")

df <- tibble(tag = tags, animal_name = animal_names)
# A tibble: 6 x 2
  tag          animal_name
  <chr>        <chr>      
1 F23-F45      Dim        
2 A69-1008-1   Dami       
3 A69-1008-10  Pet        
4 A69-1008-10  Nic        
5 A69-1008-100 Li         
6 A69-1008-12  Tan

I would like to sort first by tag and then by animal_name. Typically I do it with arrange()function of dplyr package. It would result in this df:

> df %>% arrange(tag, animal_name)
# A tibble: 6 x 2
  tag          animal_name
  <chr>        <chr>      
1 A69-1008-1   Dami       
2 A69-1008-10  Nic        
3 A69-1008-10  Pet        
4 A69-1008-100 Li         
5 A69-1008-12  Tan        
6 F23-F45      Dim        

However, I would like to apply a kind of numeric sort so that A69-1008-12 occurrs before A69-1008-100. This problem has been solved for a vector already (see this question) but I wonder how to transfer it to data.frames?

damianooldoni
  • 475
  • 5
  • 6

5 Answers5

4

In stringr 1.5 you can use arrange(df, str_rank(x,nummeric=T))

2

Inspired by https://stackoverflow.com/a/59730594:

This can be done with arrange() by converting df$tag to a factor first, with levels in a predefined order:

library(dplyr)
library(stringr)

# Create a vector of tags in the desired order
sorted_tags <-
  df %>%
  distinct(tag) %>%        # Unique tags only
  pull() %>%               # Convert to vector
  str_sort(numeric = TRUE) # Sort as numeric values
sorted_tags
#[1] "A69-1008-1"   "A69-1008-10"  "A69-1008-12"  "A69-1008-100" "F23-F45"

# Sort the df with arrange(), but use tag as a factor with sorted_tags as levels
df %>%
  arrange(
    factor(tag, levels = sorted_tags),
    animal_name
  )
# A tibble: 6 x 2
#  tag          animal_name
#  <chr>        <chr>      
#1 A69-1008-1   Dami       
#2 A69-1008-10  Nic        
#3 A69-1008-10  Pet        
#4 A69-1008-12  Tan        
#5 A69-1008-100 Li         
#6 F23-F45      Dim        
1

Package stringr's sorting functions have an argument numeric that when set to TRUE sort the data numerically, even if they are mixed character/numeric data.

i <- stringr::str_order(df$tag, numeric = TRUE)
df[i, ]
## A tibble: 5 x 2
#  tag          animal_name
#  <chr>        <chr>      
#1 A69-1008-1   Damiano    
#2 A69-1008-10  Nico       
#3 A69-1008-10  Peter      
#4 A69-1008-100 Tanja      
#5 A69-1008-12  Lien     

Edit

To answer to the OP's comment asking for what already is in the question (my emphasis),

Yes, this is what I first tried and it works great with vectors, but how to order a data.frame by tag AND animal_name using str_sort?

the following dplyr pipe will solve the problem. The trick is to note that str_order will produce an order vector with no ties, even if they are present in the input vector. This will make a second column irrelevant. Therefore, the code groups by the input vector tag, then mutates the order vector i making each group equal to its first element. And now the dataframe can be sorted with arrange.

df %>%
  mutate(i = stringr::str_order(df$tag, numeric = TRUE)) %>%
  group_by(tag) %>%
  mutate(i = first(i)) %>%
  arrange(i, animal_name) %>%
  select(-i)
## A tibble: 5 x 2
## Groups:   tag [4]
#  tag          animal_name
#  <chr>        <chr>      
#1 A69-1008-1   Damiano    
#2 A69-1008-10  Nico       
#3 A69-1008-10  Peter      
#4 A69-1008-12  Tanja      
#5 A69-1008-100 Lien     
Rui Barradas
  • 70,273
  • 8
  • 34
  • 66
  • Yes, this is what I first tried and it works great with vectors, but how to order a data.frame by `tag` AND `animal_name` using `str_sort`? – damianooldoni Aug 28 '20 at 11:50
0

I am not sure why gtools::mixedorder doesn't seem to give expected output in this case.

Here is an alternative by removing all non-numeric values from tag and ordering the data.

In base R, we can do :

df[with(df, order(as.numeric(gsub('\\D', '', tag)), animal_name)), ]

The same can be used in arrange as well :

library(dplyr)
df %>% arrange(order(as.numeric(gsub('\\D', '', tag))), animal_name)

#  tag          animal_name
#  <chr>        <chr>      
#1 A69-1008-1   Damiano    
#2 A69-1008-10  Peter      
#3 A69-1008-10  Nico       
#4 A69-1008-12  Tanja      
#5 A69-1008-100 Lien       
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • `gtools::mixedorder` is good for vectors, but how to using it for sorting data.frames by multiple columns? The solution with `as.numeric` would not work in general case where tag column contains tags with not only different digits but also different characters. – damianooldoni Aug 28 '20 at 11:52
  • I guess `mixedorder` doesn't work because of double `"-"` here. Can you update your post with sample of your actual data? I guess we have to use some sort of regex to achieve the desired output. – Ronak Shah Aug 28 '20 at 12:09
  • I updated the post adding a case where tag has different syntax. The case shown in example is actual data. I just removed not relevant columns. – damianooldoni Aug 28 '20 at 12:53
0

I found a solution to my answer actually. I could achieve what I wanted by using function mixedsortDF from jamba package:

mixedSortDF(df)
# A tibble: 6 x 2
  tag          animal_name
  <chr>        <chr>      
1 A69-1008-1   Dami       
2 A69-1008-10  Nic        
3 A69-1008-10  Pet        
4 A69-1008-12  Tan        
5 A69-1008-100 Li         
6 F23-F45      Dim     

The only drawback is that if df has more columns and you want to order by a subset of them, you cannot provide column names but only column numbers:

df$extra <- c(1,3,5,4,10,3)
sort_by <- c("animal_name", "tag")
library(purrr)
sort_by_col_numbers <- map_dbl(sort_by, ~which(colnames(df) %in% .)) # to get column numbers always in right order

mixedSortDF(df, byCols = sort_by_col_numbers)
# A tibble: 6 x 3
  tag          animal_name extra
  <chr>        <chr>       <dbl>
1 A69-1008-1   Dami            3
2 F23-F45      Dim             1
3 A69-1008-100 Li             10
4 A69-1008-10  Nic             4
5 A69-1008-10  Pet             5
6 A69-1008-12  Tan             3
damianooldoni
  • 475
  • 5
  • 6