2

I have a data.frame with a column (character) that has a list of values such as (the prefix refers to the season and suffix a year):

Wi_1984,
Su_1985,
Su_1983,
Wi_1982,
Su_1986,
Su_1984,

I want to keep the column type and format as it is, but what I would like to do is order the df by this column in ascending season_year order. So I would like to produce:

Wi_1982,
Su_1983,
Su_1984,
Wi_1984,
Su_1985,
Su_1986,

Using normal sorting will arrange by Wi_ or Su_ and not by _1984 i.e. _year. Any help much appreciated. If this could be done in dplyr / tidyverse that would be grand.

Axeman
  • 32,068
  • 8
  • 81
  • 94
Drew
  • 131
  • 8
  • 2
    Yes. What have you tried? For a task like this, try doing it by hand with pencil and paper. Try to formulate which steps you need to take in order to complete your task. – MrGumble Aug 01 '19 at 12:30
  • 2
    Split into 2 columns, then use `order` as usual: `df1[ order(df1$Year, df1$Season), ]`, we don't need yet another magic function for this. – zx8754 Aug 01 '19 at 13:09

3 Answers3

2

We can use parse_number to get the numeric part and use that in arrange

library(dplyr)
library(readr)
df1 %>%
   arrange(parse_number(col1))

Or if the numbers can appear as prefix, then extract the last part

df1 %>%
  arrange(as.numeric(str_extract(col1, "\\d+$")))
akrun
  • 874,273
  • 37
  • 540
  • 662
2

In base R, we can extract the numeric part using sub and order

df[order(as.integer(sub(".*?(\\d+)", "\\1", df$col))), ]
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
2

To answer based on @zx8754 comment, you can do,

library(dplyr)

df %>% 
 separate(X1, into = c('season', 'year')) %>% 
 arrange_at(vars(c(2, 1)))

which gives,

# A tibble: 6 x 2
  season year 
  <chr>  <chr>
1 Wi     1982 
2 Su     1983 
3 Su     1984 
4 Wi     1984 
5 Su     1985 
6 Su     1986
Sotos
  • 51,121
  • 6
  • 32
  • 66