1

I am beating my brains out on something that is probably straight forward. I want to get a "dense" ranking (as defined for the data.table::frank function), on a column in a data frame, but not based on the columns proper order, the order should be given by another column (val in my example)

I managed to get the dense ranking with @Prasad Chalasani 's solution, like that:

library(dplyr)
foo_df <- data.frame(id = c(4,1,1,3,3), val = letters[1:5])

foo_df %>% arrange(val) %>% mutate(id_fac = as.integer(factor(id)))
#>   id val id_fac
#> 1  4   a      3
#> 2  1   b      1
#> 3  1   c      1
#> 4  3   d      2
#> 5  3   e      2

But I would like the factor levels to be ordered based on val. Desired output:

foo_desired <-  foo_df %>% arrange(val) %>% mutate(id_fac = as.integer(factor(id, levels = c(4,1,3))))
foo_desired
#>   id val id_fac
#> 1  4   a      1
#> 2  1   b      2
#> 3  1   c      2
#> 4  3   d      3
#> 5  3   e      3
  • I tried data.table::frank
  • I tried both methods by @Prasad Chalasani.
  • I tried setting the order of id using id[rank(val)] (and sort(val), and order(val)).
  • Finally, I also tried to sort the levels using rank(val) etc, but this throws an error (Evaluation error: factor level [3] is duplicated.)

  • I know that one can specify the level order, I used this for creation of the desired output. This solution is however not great as my data has way more rows and levels

I need that for convenience, in order to produce a table with a specific order, not for computations.

Created on 2018-12-19 by the reprex package (v0.2.1)

tjebo
  • 21,977
  • 7
  • 58
  • 94

2 Answers2

3

You can check with first

foo_df %>% arrange(val) %>% 
          group_by(id)%>%mutate(id_fac = first(val))%>%
          ungroup()%>%
          mutate(id_fac=as.integer(factor(id_fac)))
# A tibble: 5 x 3
     id    val id_fac
  <dbl> <fctr>  <int>
1     4      a      1
2     1      b      2
3     1      c      2
4     3      d      3
5     3      e      3
BENY
  • 317,841
  • 20
  • 164
  • 234
2

Why do you even need factors ? Not sure if I am missing something but this gives your desired output.

You can use match to get id_fac based on the occurrence of ids.

library(dplyr)

foo_df %>%
    mutate(id_fac = match(id, unique(id)))

#  id val id_fac
#1  4   a      1
#2  1   b      2
#3  1   c      2
#4  3   d      3
#5  3   e      3
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • You are absolutely right, I don't need factors - it was only my method to get this dense ranking. Very nice solution. Thanks!! – tjebo Dec 19 '18 at 15:50