0

Given the following data.frame, i want to have the order of 'numbers' for each group.

df
   group     numbers
1      A -0.80097537
2      B -0.69498701
3      C  0.55627105
4      D -0.05810593
5      A -1.41748489
6      B  0.30198594
7      C  1.11918243
8      D  0.02595183
9      A  1.74417489
10     B  0.42435785
11     C  0.75889049
12     D -2.21025222
13     A  0.57149543
14     B  0.77944238
15     C  3.04021182
16     D -0.14157181
17     A -0.29213733
18     B -1.00858701
19     C  1.49959112
20     D -0.57532183

structure(list(group = structure(c(1L, 2L, 3L, 4L, 1L, 2L, 3L, 
4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L), class = "factor", .Label = c("A", 
"B", "C", "D")), numbers = c(-0.800975371801324, -0.694987011133934, 
0.556271051640264, -0.0581059266921911, -1.41748489222262, 0.301985943949874, 
1.11918243487368, 0.0259518302570701, 1.74417489077084, 0.424357848275249, 
0.758890492984891, -2.2102522179535, 0.571495432426037, 0.779442380219119, 
3.04021182328692, -0.141571814386413, -0.292137333159453, -1.00858701158259, 
1.49959111842538, -0.575321833031783)), .Names = c("group", "numbers"
), row.names = c(NA, -20L), class = "data.frame")

Somehow for group D the order is incorrect:

df %>% 
 group_by(group) %>% 
 mutate(x=order(numbers)) %>% 
 arrange(group, x)

Source: local data frame [20 x 3]
Groups: group [4]

    group     numbers     x
   <fctr>       <dbl> <int>
1       A -1.41748489     1
2       A -0.80097537     2
3       A -0.29213733     3
4       A  0.57149543     4
5       A  1.74417489     5
6       B  0.30198594     1
7       B  0.42435785     2
8       B  0.77944238     3
9       B -1.00858701     4
10      B -0.69498701     5
11      C  0.55627105     1
12      C  0.75889049     2
13      C  1.11918243     3
14      C  1.49959112     4
15      C  3.04021182     5
16      D -0.14157181     1
17      D -0.57532183     2
18      D -0.05810593     3
19      D -2.21025222     4
20      D  0.02595183     5

So specifically the order of line 19 is wrong. Any idea about my misconception?

Sotos
  • 51,121
  • 6
  • 32
  • 66
c0bra
  • 1,031
  • 5
  • 22
  • The `df %>% group_by(group) %>% arrange(group, numbers)` is giving correct output. You can use `dense_rank` instead of `order` i.e. `df %>% group_by(group) %>% mutate(x=dense_rank(numbers)) %>% arrange(group, x)` – akrun Feb 20 '17 at 10:17
  • But why are order and dense_rank giving different results? – c0bra Feb 20 '17 at 10:20
  • Just do this `df %>% group_by(group) %>% mutate(x=order(order(numbers))) %>% arrange(group, x)` – akrun Feb 20 '17 at 10:25

1 Answers1

1

The order is giving the order of values. Here, we need rank

library(tidyverse)
df %>% 
    group_by(group) %>% 
    mutate(x=dense_rank(numbers)) %>% 
    arrange(group, x)
akrun
  • 874,273
  • 37
  • 540
  • 662
  • 1
    [rank-and-order-in-r](http://stackoverflow.com/questions/12289224/rank-and-order-in-r) helped me to understand. simple `rank` from R seems to be sufficient, too. – c0bra Feb 20 '17 at 10:24
  • @c0bra Yes, that is true, but in case of ties, then `rank` will go into default mode – akrun Feb 20 '17 at 10:25