1

I'd like to add a (dense) rank column to my dataframe based on multiple other columns, like rank() over (order by a, b) in SQL. In R, the rank function only accepts one column, so mutate(df, rank(a, b)) throws an error. The order_by function also only accepts one column.

So given this data frame:

d <- data.frame(a = c(1, 1, 1, 2), b = c(1, 1, 2, 2))

...I'd like a rank like the following:

 a  b  rank
 1  1  1 
 1  1  1 
 1  2  2 
 2  2  3 

My actual dataframe is much larger and the ranking needs to be over multiple columns of different types (mostly strings and doubles).

Henrik
  • 65,555
  • 14
  • 143
  • 159
Brian
  • 161
  • 1
  • 7
  • 1
    If you are happy to leave the `verse`, you may use `data.table::frank(d, ties.method = "dense")`. `?frank`: "Similar to `base::rank` but much faster. And it accepts vectors, lists, data.frames or data.tables as input [...] If `...` is missing, all columns are considered by default. To sort by a column in descending order prefix "`-`", e.g., `frank(x, a, -b, c)`. – Henrik Nov 29 '18 at 11:29
  • 1
    The lack of `dplyr` answers here [dplyr ranking observations across variables](https://stackoverflow.com/questions/28588971/dplyr-ranking-observations-across-variables) and here [Calculate rank with ties based on more than one variable](https://stackoverflow.com/questions/45917470/calculate-rank-with-ties-based-on-more-than-one-variable) suggests that it may not be straightforward in `dplyr`. – Henrik Nov 29 '18 at 11:35
  • Consider that `rank(x)` is much similar to `order(order(x))` and the latter can be applied to multiple vectors. For instance, if `x` is a `list` or `data.frame`, you could try `order(do.call(order,x))`. – nicola Nov 29 '18 at 11:53
  • In this case because a and b are both numbers under 10, you can use `mutate(df, rank = dense_rank(10*a + b))` – Kerry Jackson Nov 29 '18 at 12:36

0 Answers0