5

I have a data frame that I want to sort by one column than the next, (using tidyverse if possible).

I checked the below address but the solutions did not seem to work.

Order a "mixed" vector (numbers with letters)

Sample code for an example:

variable <- c("channel", "channel", "channel", "comp_ded", "comp_ded", "comp_ded")
level <- c("DIR", "EA", "IA", "500", "750", "1000")
df <- as_tibble(cbind(variable, level))

This does not give me what I want:

df <- df %>% arrange(variable, level)

The order of the level columns are as follows:

variable level channel DIR channel EA channel IA level 1000 level 500 level 750

I need them:

variable level channel DIR channel EA channel IA level 500 level 750 level 1000

There are multiple different "variables" in the real data set where half need to be sorted in number order and half in alphabetical. Does anyone know how to do this?

Jordan
  • 1,415
  • 3
  • 18
  • 44

6 Answers6

3

The simplest solution would be to use dplyr::group_by.

library(dplyr)

variable <- c("channel", "channel", "channel", "comp_ded", "comp_ded", "comp_ded")
level <- c("DIR", "EA", "IA", "500", "750", "1000")
df <- as_tibble(cbind(variable, level))

df %>%
  group_by(variable, level) %>%
  arrange()

# A tibble: 6 x 2
  variable  level
     <chr> <fctr>
1 comp_ded    DIR
2 comp_ded     EA
3 comp_ded     IA
4  channel    500
5  channel    750
6  channel   1000
zlipp
  • 790
  • 7
  • 16
2

It's slightly ugly, but you could just split the data frame in two using filter statements, arrange each section individually, and then bind them back together:

df <- bind_rows(df %>%
              filter(!is.na(as.numeric(level))) %>%
              arrange(variable, as.numeric(level)),
          df %>%
              filter(is.na(as.numeric(level))) %>%
              arrange(variable, level))

Gives you:

# A tibble: 6 x 2
  variable level
  <chr>    <chr>
1 comp_ded 500  
2 comp_ded 750  
3 comp_ded 1000 
4 channel  DIR  
5 channel  EA   
6 channel  IA   
divibisan
  • 11,659
  • 11
  • 40
  • 58
2

Using gtools, a slightly shorter solution which uses mixedorder:

library(gtools)
sorteddf <- df[with(df, order(variable, mixedorder(level))),]

Output:

  variable level
1 channel  DIR  
2 channel  EA   
3 channel  IA   
4 comp_ded 500  
5 comp_ded 750  
6 comp_ded 1000
Marcus Campbell
  • 2,746
  • 4
  • 22
  • 36
1

You could create a temporary variable for sorting. Once you've sorted in the desired order, you can also set the order permanently by converting to factor (as in @Vio's answer). Maybe something like this:

df = df %>% 
  mutate(tmp = as.numeric(level)) %>% 
  arrange(variable, tmp, level) %>% 
  select(-tmp) %>% 
  mutate(level = factor(level, levels=unique(level)))
  variable level
  <chr>    <fct>
1 channel  DIR  
2 channel  EA   
3 channel  IA   
4 comp_ded 500  
5 comp_ded 750  
6 comp_ded 1000

I think you can also shorten this by not explicitly creating a temporary variable, and instead using an "anonymous" variable inside arrange:

df = df %>% 
  arrange(variable, as.numeric(level), level) %>% 
  mutate(level = factor(level, levels=unique(level)))
eipi10
  • 91,525
  • 24
  • 209
  • 285
1

Convert to factor and change the levels. Even easier with forcats::fct_relevel()

# Convert to factor
df <- as_tibble(cbind(variable, level)) %>%
  mutate(level = as.factor(level))

# Change order of levels
levels(df$level) = levels(df$level)[match(c("DIR", "EA", "IA", "500", "750", "1000"), levels(df$level))]

df %>% arrange(level)

# A tibble: 6 x 2
  variable  level
     <chr> <fctr>
1 comp_ded    DIR
2 comp_ded     EA
3 comp_ded     IA
4  channel    500
5  channel    750
6  channel   1000
Vlo
  • 3,168
  • 13
  • 27
  • This makes sense for the example. My real data set has 220 observations – Jordan Apr 05 '18 at 20:36
  • @Jordan It changes the levels of the factor. It will work on any number of observations since you are sorting factors (which is sorted by the ordering of its level). If you have more levels, define their ordering manually. Or if you are looking for simple A-Z, 0-9 sort for more than 6 factor levels, then split sorting is probably the best way. – Vlo Apr 05 '18 at 20:39
  • split sorting? I'll have to look that up.. It's 10 different variables and levels within those. Thanks. – Jordan Apr 05 '18 at 20:40
  • By split, I meant the way divibisan's answer is sorting. – Vlo Apr 05 '18 at 20:41
0

I think it's much easier to sort by as.numeric(level) first, then by level:

df %>% arrange(variable, as.numeric(level), level)

Gives:

# A tibble: 6 x 2
variable level
<chr>    <chr>
1 channel  DIR
2 channel  EA
3 channel  IA
4 comp_ded 500
5 comp_ded 750
6 comp_ded 1000 
Joakim
  • 2,092
  • 1
  • 20
  • 23