Calculate rank based on several columns, with a precedence rule

Question

I have a dataframe like this

df <- expand.grid(0:1, 0:1, 0:1, 0:1)
df
   Var1 Var2 Var3 Var4
1     0    0    0    0
2     1    0    0    0
3     0    1    0    0
4     1    1    0    0
5     0    0    1    0
6     1    0    1    0
7     0    1    1    0
8     1    1    1    0
9     0    0    0    1
10    1    0    0    1
11    0    1    0    1
12    1    1    0    1
13    0    0    1    1
14    1    0    1    1
15    0    1    1    1
16    1    1    1    1

I am trying to create a Rank column based on some conditions on Var1, Var2, Var3, Var4

The order of ranking precedence is determined by the variables

Column Var1 has the highest preference and if it has a value of 1, then it is given a higher rank
Column Var2 has a higher preference over Var3, Var4
Columns Var1 and Var2 have higher preference over Var3, Var4
There is NO preference given to Var3 and Var4 and are only used as counts for ranking

If any rows have the same counts for Var3, Var4, then they are ranked with the same number.

My desired output is

   Var1 Var2 Var3 Var4 rank
1     0    0    0    0   12
2     1    0    0    0    6
3     0    1    0    0    9
4     1    1    0    0    3
5     0    0    1    0   11
6     1    0    1    0    5
7     0    1    1    0    8
8     1    1    1    0    2
9     0    0    0    1   11
10    1    0    0    1    5
11    0    1    0    1    8
12    1    1    0    1    2
13    0    0    1    1   10
14    1    0    1    1    4
15    0    1    1    1    7
16    1    1    1    1    1

I am trying to this manually but it is not very efficient

df %>%
  mutate(rank = case_when(
    Var1 == 1 & Var2 == 1 & Var3 == 1 & Var4 == 1~ "1",
    Var1 == 1 & Var2 == 1 & Var3 == 1 & Var4 == 0~ "2",
    TRUE                                     ~ ""
  ))

I want to apply the logic to a larger dataset. Is there an efficient way to do this? Can someone point me in the right direction?

Henrik · Accepted Answer · 2020-07-09T22:00:14.937

frank and frankv in data.table "accepts vectors, lists, data.frames or data.tables as input", which can be useful here.

First, frankv. It has a cols argument where columns to be ranked can be specified in a character vector - convenient if there are many column names which need to be generated programmatically. It also has a neat order argument.

library(data.table)
setDT(df)
df[ , Var34 := Var3 + Var4]
cols = c("Var1", "Var2", "Var34")
df[ , r := frankv(.SD, cols, order = -1L, ties.method = "dense")]
df[ , Var34 := NULL]

#     Var1 Var2 Var3 Var4  r
#  1:    0    0    0    0 12
#  2:    1    0    0    0  6
#  3:    0    1    0    0  9
#  4:    1    1    0    0  3
#  5:    0    0    1    0 11
#  6:    1    0    1    0  5
#  7:    0    1    1    0  8
#  8:    1    1    1    0  2
#  9:    0    0    0    1 11
# 10:    1    0    0    1  5
# 11:    0    1    0    1  8
# 12:    1    1    0    1  2
# 13:    0    0    1    1 10
# 14:    1    0    1    1  4
# 15:    0    1    1    1  7
# 16:    1    1    1    1  1

frank is handy for interactive use:

df[ , r := frank(.SD, -Var1, -Var2, -Var34, ties.method = "dense")]

Related answers: How to emulate SQLs rank functions in R?; Rank based on several variables

Thanks for this solution :-) data.table continues to amaze me with its simplicity and execution. I am now applying your logic to my large data frame. I will report back — Sharath, Jul 09 '20 at 18:53
It works perfectly on my bigger dataset. Thank you so much :-) — Sharath, Jul 09 '20 at 19:22

Rémi Coulaud · Answer 2 · 2020-07-09T18:30:04.473

0

I propose this which is a small trick :

df <- expand.grid(0:1, 0:1, 0:1, 0:1)

df[,2] <- df[,2] * 10
df[,3] <- df[,3] * 100
df[,4] <- df[,4] * 100

rank <- rowSums(df)
as.numeric(as.factor(rank))

edited Jul 09 '20 at 18:30

answered Jul 09 '20 at 17:56

Rémi Coulaud

1,684
1
8
19

1

FYI `apply(df, 1, sum)` is equivalent to `rowSums(df)` – Daniel O Jul 09 '20 at 18:29

Calculate rank based on several columns, with a precedence rule

2 Answers2