ranking dataframe using two columns in R

Question

I am new to using R, and I am having hard time trying to rank dataframe using two columns in R. The data is in the form of this.

I want result to be in the form of

which is ranked by B first, and the A is used to rank if there is equal value in B. I think my question is very similar to How to rank rows by two columns at once in R? and I tried the answers for this, but this didn't work for me.

I don't get it. Can you explain exactly how you obtained your rank column? Also, why is the third row different in the two data sets? — user2974951, Jan 07 '22 at 08:31
I think there are some typos in the second example table. Column `A` used to have a 3 in it (replaced with a 2) and column `B` used to have a 2 in it (replaced by a 1). — KamRa, Jan 07 '22 at 09:13
For a broader description of different ties methods see [How to emulate SQLs rank functions in R?](https://stackoverflow.com/questions/11446254/how-to-emulate-sqls-rank-functions-in-r). There you also find a nice [description of `data.table::frank`](https://stackoverflow.com/a/28141041) (highly relevant for you, when ranking on several columns). — Henrik, Jan 07 '22 at 10:33

Merijn van Tilborg · Answer 1 · 2022-01-07T09:10:22.560

What you can do is sort on B and A, give it a group ID and then rank on this group ID

Data

dt <- structure(list(A = c(1L, 2L, 2L, 4L, 5L), B = c(1L, 1L, 1L, 4L, 
3L)), row.names = c(NA, -5L), class = c("data.frame"
))

data.table

library(data.table)
setDT(dt)

setorder(dt, B, A)
dt[, Rank := .GRP, by = .(A, B)][, Rank := rank(Rank, ties.method = "min")]

dt
   A B Rank
1: 1 1    1
2: 2 1    2
3: 2 1    2
4: 5 3    4
5: 4 4    5

dyplr

library(dplyr)

dt %>% 
  arrange(B, A) %>%
  group_by(B, A) %>%
  mutate(Rank = cur_group_id()) %>%
  ungroup() %>%
  mutate(Rank = rank(Rank, ties.method = "min"))

# A tibble: 5 x 3
      A     B  Rank
  <int> <int> <int>
1     1     1     1
2     2     1     2
3     2     1     2
4     5     3     4
5     4     4     5

Maël · Accepted Answer · 2022-01-07T09:45:57.673

4

You can use data.table::frank or dplyr::min_rank:

`data.table::frank`

dt$Rank <- frank(dt, B, A, ties.method = "min")
dt
  A B Rank
1 1 1    1
2 2 1    2
3 2 1    2
4 4 4    5
5 5 3    4

`dplyr::min_rank`

mutate(dt, Rank = min_rank(paste(B,A)))
  A B Rank
1 1 1    1
2 2 1    2
3 2 1    2
4 4 4    5
5 5 3    4

Data

dt <- data.frame(A = c(1,2,2,4,5), B = c(1,1,1,4,3))

edited Jan 07 '22 at 09:45

answered Jan 07 '22 at 09:35

Maël

45,206
3
29
67

1

I was not aware of frank either, to do it on reference this seems to be the way `dt[, Rank := frank(.SD, ties.method = "min"), .SDcols = c("B", "A")]` – Merijn van Tilborg Jan 07 '22 at 10:16
Is it right that in the `dplyr` solution, `paste` won't work correctly if some of the values of `A` and `B` are two digits? – bill999 Sep 02 '22 at 16:30

score 1 · Answer 3 · answered Jan 07 '22 at 09:44

1

Perhaps this base R option could help

transform(
  df,
  Rank = rank(A[order(B, A)], ties.method = "min")
)

which gives

answered Jan 07 '22 at 09:44

ThomasIsCoding

96,636
9
24
81

csgroen · Answer 4 · 2022-01-07T09:35:18.603

The easiest way I can think of is using a function called arrange from dplyr to order first by B, then A, create the rank as the order of the columns, then re-arrange with the original arrangement to simulate your output.

To repeat the rank of identical entries, we can add that ifelse statement that looks at the previous row and keeps the rank if this row has identical A and B values.

Here's how that would look like:

df <- data.frame(A = c(1,2,2,4,5),
                 B = c(1,1,1,4,3))

library(tidyverse)
df %>%
    arrange(B,A) %>%
    mutate(Rank = 1:n(),
           Rank = ifelse(A == lag(A, default = 0) & B == lag(B, default = 0), 
                         lag(Rank), Rank)) %>%
    arrange(A,B)
#>   A B Rank
#> 1 1 1    1
#> 2 2 1    2
#> 3 2 1    2
#> 4 4 4    5
#> 5 5 3    4

^{Created on 2022-01-07 by the reprex package (v2.0.1)}

But the OP is getting duplicated 2 rank. And your code gives a different result — asd-tm, Jan 07 '22 at 08:54

ranking dataframe using two columns in R

4 Answers4

`data.table::frank`

`dplyr::min_rank`

Related