Dense ranking of column based on order of second column

Question

I am beating my brains out on something that is probably straight forward. I want to get a "dense" ranking (as defined for the data.table::frank function), on a column in a data frame, but not based on the columns proper order, the order should be given by another column (val in my example)

I managed to get the dense ranking with @Prasad Chalasani 's solution, like that:

library(dplyr)
foo_df <- data.frame(id = c(4,1,1,3,3), val = letters[1:5])

foo_df %>% arrange(val) %>% mutate(id_fac = as.integer(factor(id)))
#>   id val id_fac
#> 1  4   a      3
#> 2  1   b      1
#> 3  1   c      1
#> 4  3   d      2
#> 5  3   e      2

But I would like the factor levels to be ordered based on val. Desired output:

foo_desired <-  foo_df %>% arrange(val) %>% mutate(id_fac = as.integer(factor(id, levels = c(4,1,3))))
foo_desired
#>   id val id_fac
#> 1  4   a      1
#> 2  1   b      2
#> 3  1   c      2
#> 4  3   d      3
#> 5  3   e      3

I tried data.table::frank
I tried both methods by @Prasad Chalasani.
I tried setting the order of id using id[rank(val)] (and sort(val), and order(val)).
Finally, I also tried to sort the levels using rank(val) etc, but this throws an error (Evaluation error: factor level [3] is duplicated.)
I know that one can specify the level order, I used this for creation of the desired output. This solution is however not great as my data has way more rows and levels

I need that for convenience, in order to produce a table with a specific order, not for computations.

^{Created on 2018-12-19 by the reprex package (v0.2.1)}

score 3 · Answer 1 · answered Dec 19 '18 at 15:37

3

You can check with first

foo_df %>% arrange(val) %>% 
          group_by(id)%>%mutate(id_fac = first(val))%>%
          ungroup()%>%
          mutate(id_fac=as.integer(factor(id_fac)))
# A tibble: 5 x 3
     id    val id_fac
  <dbl> <fctr>  <int>
1     4      a      1
2     1      b      2
3     1      c      2
4     3      d      3
5     3      e      3

answered Dec 19 '18 at 15:37

BENY

317,841
20
164
234

That's a nice approach. However, I will accept @Ronak Shah's answer as more direct. – tjebo Dec 19 '18 at 15:50

Ronak Shah · Accepted Answer · 2018-12-19T15:51:09.767

2

Why do you even need factors ? Not sure if I am missing something but this gives your desired output.

You can use match to get id_fac based on the occurrence of ids.

library(dplyr)

foo_df %>%
    mutate(id_fac = match(id, unique(id)))

#  id val id_fac
#1  4   a      1
#2  1   b      2
#3  1   c      2
#4  3   d      3
#5  3   e      3

edited Dec 19 '18 at 15:51

answered Dec 19 '18 at 15:48

Ronak Shah

377,200
20
156
213

You are absolutely right, I don't need factors - it was only my method to get this dense ranking. Very nice solution. Thanks!! – tjebo Dec 19 '18 at 15:50

Dense ranking of column based on order of second column

2 Answers2