0

I want to order groups in a dplyr tibble by using an aggregation function. Let's say I have this data:

library(dplyr)
df <- tibble(
  g = c(0, 0, 1, 1, 2, 2, 2),
  x = c(0, 1, 1, 2, -2, -3, -3)
)

Where g is the grouping variable. If I want to sort using the mean of x, I would expect g=2 observations be on top, then g=0 and then g=1. The first thing that comes to mind is to:

df %>% 
  group_by(g) %>% 
  arrange(mean(x))

But this is not sorted the way I expected:

# A tibble: 7 x 2
# Groups:   g [3]
      g     x
  <dbl> <dbl>
1     0     0
2     0     1
3     1     1
4     1     2
5     2    -2
6     2    -3
7     2    -3

Instead, I would expect to have something like:

# A tibble: 7 x 2
# Groups:   g [3]
      g     x
  <dbl> <dbl>
1     2    -2
2     2    -3
3     2    -3
4     0     0
5     0     1
6     1     1
7     1     2

is there a tidy way to do this operation?

David Masip
  • 2,146
  • 1
  • 26
  • 46
  • You could try `df %>% group_by(g) %>% arrange(group_by(., g) %>% mutate(across(x, mean)) %>% pull(x))`, but I'm not sure whether it would always work reliably. – tmfmnk Feb 03 '21 at 08:31
  • 1
    `df[order(ave(df$x, df$g, FUN = mean)), ]` – rawr Feb 03 '21 at 08:37
  • Or https://stackoverflow.com/questions/46008444/how-to-reorder-a-data-frame-based-on-mean-of-column-groups – tjebo Feb 03 '21 at 08:52

2 Answers2

2

Does this work:

df %>% group_by(g) %>% mutate(m = mean(x)) %>% arrange(m) %>% select(-m)
# A tibble: 7 x 2
# Groups:   g [3]
      g     x
  <dbl> <dbl>
1     2    -2
2     2    -3
3     2    -3
4     0     0
5     0     1
6     1     1
7     1     2
Karthik S
  • 11,348
  • 2
  • 11
  • 25
  • This works indeed, I was looking for a more "automatic" way to do it, like a function. Do you think a general function can be extracted from your pipeline in here? – David Masip Feb 03 '21 at 08:27
  • hmmm exact same code as the accepted answer? https://stackoverflow.com/questions/46008444/how-to-reorder-a-data-frame-based-on-mean-of-column-groups – StupidWolf Feb 03 '21 at 08:55
1

You can get the order in which you want to arrange the data.

order_vec <- names(sort(tapply(df$x, df$g, mean)))
order_vec
#[1] "2" "0" "1"

And then can use dplyr::arrange :

library(dplyr)
df %>% arrange(match(g, order_vec))

#      g     x
#  <dbl> <dbl>
#1     2    -2
#2     2    -3
#3     2    -3
#4     0     0
#5     0     1
#6     1     1
#7     1     2

Or base R subsetting :

df[order(match(df$g, order_vec)), ]
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213