0

I'm trying to rank the certain groups by their counts using dense_rank, it doesn't make a distinct rank for groups that are tied. And any ranking function I try that has some sort of ties.method doesn't give me the rankings in a consecutive 1,2,3 order. Example:

library(dplyr)
id <- c(rep(1, 8),
        rep(2, 8))

fruit <- c(rep('apple', 4), rep('orange', 1), rep('banana', 2), 'orange',
           rep('orange', 4), rep('banana', 1), rep('apple', 2), 'banana')

df <- data.frame(id, fruit, stringsAsFactors = FALSE)

df2 <- df %>% 
  mutate(counter = 1) %>% 
  group_by(id, fruit) %>% 
  mutate(fruitCnt = sum(counter)) %>% 
  ungroup() %>% 
  group_by(id) %>% 
  mutate(fruitCntRank = dense_rank(desc(fruitCnt))) %>% 
  select(id, fruit, fruitCntRank)
df2

      id fruit  fruitCntRank
 1     1 apple             1
 2     1 apple             1
 3     1 apple             1
 4     1 apple             1
 5     1 orange            2
 6     1 banana            2
 7     1 banana            2
 8     1 orange            2
 9     2 orange            1
10     2 orange            1
11     2 orange            1
12     2 orange            1
13     2 banana            2
14     2 apple             2
15     2 apple             2
16     2 banana            2

It doesn't matter which of orange or banana are ranked 3, and it doesn't even need to be consistent. I just need the groups to be ranked 1, 2, 3.

Desired result:

      id fruit  fruitCntRank
 1     1 apple             1
 2     1 apple             1
 3     1 apple             1
 4     1 apple             1
 5     1 orange            2
 6     1 banana            3
 7     1 banana            3
 8     1 orange            2
 9     2 orange            1
10     2 orange            1
11     2 orange            1
12     2 orange            1
13     2 banana            2
14     2 apple             3
15     2 apple             3
16     2 banana            2
CoolGuyHasChillDay
  • 659
  • 1
  • 6
  • 21
  • Does this answer your question? [How to emulate SQLs rank functions in R?](https://stackoverflow.com/questions/11446254/how-to-emulate-sqls-rank-functions-in-r) – nikn8 Apr 20 '20 at 03:05

1 Answers1

1

We can add count for each id and fruit combination, arrange them in descending order of count and get the rank using match.

library(dplyr)

df %>%
  add_count(id, fruit) %>%
  arrange(id, desc(n)) %>%
  group_by(id) %>%
  mutate(n = match(fruit, unique(fruit)))
  #Another option with cumsum and duplicated
  #mutate(n = cumsum(!duplicated(fruit)))

#      id fruit      n
#   <dbl> <chr>  <int>
# 1     1 apple      1
# 2     1 apple      1
# 3     1 apple      1
# 4     1 apple      1
# 5     1 orange     2
# 6     1 banana     3
# 7     1 banana     3
# 8     1 orange     2
# 9     2 orange     1
#10     2 orange     1
#11     2 orange     1
#12     2 orange     1
#13     2 banana     2
#14     2 apple      3
#15     2 apple      3
#16     2 banana     2
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • Looks like with your first solution, it's actually just ranking in alphabetical order and doesn't match the desired results. I'll try your other solutions and `group_indices`, thanks – CoolGuyHasChillDay Apr 20 '20 at 03:16
  • @CoolGuyHasChillDay I think I had misunderstood your question earlier. I did not take into account counts. Can you try the updated answer? – Ronak Shah Apr 20 '20 at 03:30
  • Hey sorry this wasn't clear, I'm trying to rank the groups from most observations to least observations within each `id` group. So your latest answer is correct for `id = 1` but not `id = 2`. I'd like to make it work in one dplyr statement, but I guess I could just make a bridge-table with `df %>% count(id, fruit) %>% arrange(id, desc(n)) %>% group_by(id) %>% mutate(fruitCntRank = 1:n())` and join it back in – CoolGuyHasChillDay Apr 20 '20 at 05:55
  • Sorry, I had to remove `fruit` from `arrange`. It should work now as per your requirement. Can you check? – Ronak Shah Apr 20 '20 at 05:59