3

I am new to using R, and I am having hard time trying to rank dataframe using two columns in R. The data is in the form of this.

A B
1 1
2 1
2 1
4 4
5 3

I want result to be in the form of

A B Rank
1 1 1
2 1 2
2 1 2
4 4 5
5 3 4

which is ranked by B first, and the A is used to rank if there is equal value in B. I think my question is very similar to How to rank rows by two columns at once in R? and I tried the answers for this, but this didn't work for me.

Maël
  • 45,206
  • 3
  • 29
  • 67
Gongsoonyee
  • 138
  • 8
  • 4
    I don't get it. Can you explain exactly how you obtained your rank column? Also, why is the third row different in the two data sets? – user2974951 Jan 07 '22 at 08:31
  • 1
    I think there are some typos in the second example table. Column `A` used to have a 3 in it (replaced with a 2) and column `B` used to have a 2 in it (replaced by a 1). – KamRa Jan 07 '22 at 09:13
  • 1
    For a broader description of different ties methods see [How to emulate SQLs rank functions in R?](https://stackoverflow.com/questions/11446254/how-to-emulate-sqls-rank-functions-in-r). There you also find a nice [description of `data.table::frank`](https://stackoverflow.com/a/28141041) (highly relevant for you, when ranking on several columns). – Henrik Jan 07 '22 at 10:33

4 Answers4

4

What you can do is sort on B and A, give it a group ID and then rank on this group ID

Data

dt <- structure(list(A = c(1L, 2L, 2L, 4L, 5L), B = c(1L, 1L, 1L, 4L, 
3L)), row.names = c(NA, -5L), class = c("data.frame"
))

data.table

library(data.table)
setDT(dt)

setorder(dt, B, A)
dt[, Rank := .GRP, by = .(A, B)][, Rank := rank(Rank, ties.method = "min")]

dt
   A B Rank
1: 1 1    1
2: 2 1    2
3: 2 1    2
4: 5 3    4
5: 4 4    5

dyplr

library(dplyr)

dt %>% 
  arrange(B, A) %>%
  group_by(B, A) %>%
  mutate(Rank = cur_group_id()) %>%
  ungroup() %>%
  mutate(Rank = rank(Rank, ties.method = "min"))

# A tibble: 5 x 3
      A     B  Rank
  <int> <int> <int>
1     1     1     1
2     2     1     2
3     2     1     2
4     5     3     4
5     4     4     5
Merijn van Tilborg
  • 5,452
  • 1
  • 7
  • 22
4

You can use data.table::frank or dplyr::min_rank:

data.table::frank

dt$Rank <- frank(dt, B, A, ties.method = "min")
dt
  A B Rank
1 1 1    1
2 2 1    2
3 2 1    2
4 4 4    5
5 5 3    4

dplyr::min_rank

mutate(dt, Rank = min_rank(paste(B,A)))
  A B Rank
1 1 1    1
2 2 1    2
3 2 1    2
4 4 4    5
5 5 3    4

Data

dt <- data.frame(A = c(1,2,2,4,5), B = c(1,1,1,4,3))
Maël
  • 45,206
  • 3
  • 29
  • 67
  • 1
    I was not aware of frank either, to do it on reference this seems to be the way `dt[, Rank := frank(.SD, ties.method = "min"), .SDcols = c("B", "A")]` – Merijn van Tilborg Jan 07 '22 at 10:16
  • Is it right that in the `dplyr` solution, `paste` won't work correctly if some of the values of `A` and `B` are two digits? – bill999 Sep 02 '22 at 16:30
1

Perhaps this base R option could help

transform(
  df,
  Rank = rank(A[order(B, A)], ties.method = "min")
)

which gives

  A B Rank
1 1 1    1
2 2 1    2
3 2 1    2
4 4 4    5
5 5 3    4
ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81
0

The easiest way I can think of is using a function called arrange from dplyr to order first by B, then A, create the rank as the order of the columns, then re-arrange with the original arrangement to simulate your output.

To repeat the rank of identical entries, we can add that ifelse statement that looks at the previous row and keeps the rank if this row has identical A and B values.

Here's how that would look like:

df <- data.frame(A = c(1,2,2,4,5),
                 B = c(1,1,1,4,3))

library(tidyverse)
df %>%
    arrange(B,A) %>%
    mutate(Rank = 1:n(),
           Rank = ifelse(A == lag(A, default = 0) & B == lag(B, default = 0), 
                         lag(Rank), Rank)) %>%
    arrange(A,B)
#>   A B Rank
#> 1 1 1    1
#> 2 2 1    2
#> 3 2 1    2
#> 4 4 4    5
#> 5 5 3    4

Created on 2022-01-07 by the reprex package (v2.0.1)

csgroen
  • 2,511
  • 11
  • 28