Arrange groups in dplyr with aggregation function

Question

I want to order groups in a dplyr tibble by using an aggregation function. Let's say I have this data:

library(dplyr)
df <- tibble(
  g = c(0, 0, 1, 1, 2, 2, 2),
  x = c(0, 1, 1, 2, -2, -3, -3)
)

Where g is the grouping variable. If I want to sort using the mean of x, I would expect g=2 observations be on top, then g=0 and then g=1. The first thing that comes to mind is to:

df %>% 
  group_by(g) %>% 
  arrange(mean(x))

But this is not sorted the way I expected:

# A tibble: 7 x 2
# Groups:   g [3]
      g     x
  <dbl> <dbl>
1     0     0
2     0     1
3     1     1
4     1     2
5     2    -2
6     2    -3
7     2    -3

Instead, I would expect to have something like:

# A tibble: 7 x 2
# Groups:   g [3]
      g     x
  <dbl> <dbl>
1     2    -2
2     2    -3
3     2    -3
4     0     0
5     0     1
6     1     1
7     1     2

is there a tidy way to do this operation?

You could try `df %>% group_by(g) %>% arrange(group_by(., g) %>% mutate(across(x, mean)) %>% pull(x))`, but I'm not sure whether it would always work reliably. — tmfmnk, Feb 03 '21 at 08:31
Or https://stackoverflow.com/questions/46008444/how-to-reorder-a-data-frame-based-on-mean-of-column-groups — tjebo, Feb 03 '21 at 08:52

score 2 · Accepted Answer · answered Feb 03 '21 at 08:18

2

Does this work:

df %>% group_by(g) %>% mutate(m = mean(x)) %>% arrange(m) %>% select(-m)
# A tibble: 7 x 2
# Groups:   g [3]
      g     x
  <dbl> <dbl>
1     2    -2
2     2    -3
3     2    -3
4     0     0
5     0     1
6     1     1
7     1     2

answered Feb 03 '21 at 08:18

Karthik S

11,348
2
11
25

This works indeed, I was looking for a more "automatic" way to do it, like a function. Do you think a general function can be extracted from your pipeline in here? – David Masip Feb 03 '21 at 08:27
hmmm exact same code as the accepted answer? https://stackoverflow.com/questions/46008444/how-to-reorder-a-data-frame-based-on-mean-of-column-groups – StupidWolf Feb 03 '21 at 08:55

score 1 · Answer 2 · answered Feb 03 '21 at 08:33

You can get the order in which you want to arrange the data.

order_vec <- names(sort(tapply(df$x, df$g, mean)))
order_vec
#[1] "2" "0" "1"

And then can use dplyr::arrange :

library(dplyr)
df %>% arrange(match(g, order_vec))

#      g     x
#  <dbl> <dbl>
#1     2    -2
#2     2    -3
#3     2    -3
#4     0     0
#5     0     1
#6     1     1
#7     1     2

Or base R subsetting :

df[order(match(df$g, order_vec)), ]

Arrange groups in dplyr with aggregation function

2 Answers2