0

I want to create a rank variable

Setup

test <- data.frame(column1 = c(5,5,5,6,6,7,7,7,8))
test$rank <- rank(test)

 test
  column1 rank
1       5  2.0
2       5  2.0
3       5  2.0
4       6  4.5
5       6  4.5
6       7  7.0
7       7  7.0
8       7  7.0
9       8  9.0

The answer I want is 1,1,1,2,2,3,3,3,4.

JC3019
  • 363
  • 1
  • 9
  • The answer you want isn't a rank. If the first 3 entries are equal, then the next 2 can't be equal "2" as "2" and "3" belong to the first 3. You can play around with the `ties.method` in order to check different options. In order to reach your desired output, you could do something like `match(test$column1, sort(unique(test$column1)))` – David Arenburg Apr 19 '20 at 07:55

3 Answers3

2

You need to use dense_rank.

test <- data.frame(column1 = c(5,5,5,6,6,7,7,7,8))
test$rank <- dplyr::dense_rank(test$column1)

Working of window ranking function

test %>% rename(input = column1) %>% 
  mutate(row_num_output = row_number(input),
                rank_output = min_rank(input),
                dense_rank_output = dense_rank(input))

Output to give a better understanding for your input

enter image description here

nikn8
  • 1,016
  • 8
  • 23
  • [Here](http://www.besttechtools.com/articles/article/sql-rank-functions), One of the best place to understand window ranking function. it's in SQL but concepts remain same, regardless of language. – nikn8 Apr 19 '20 at 08:06
0

A data.table solution, using frank() (fast rank) function which has the ties method "dense".

library(data.table)
test <- data.table(column1 = c(5,5,5,6,6,7,7,7,8))
test[, rank := frank(column1, ties.method = "dense")]

Alternatively a base R solution using match

test$rank <- match(test$column1, unique(test$column1[order(test$column1)]))
rg255
  • 4,119
  • 3
  • 22
  • 40
0

There are multiple ways you can do this :

In dplyr, you can use group_indices

test$rank <- dplyr::group_indices(test, column1)

Or in base R, cumsum with duplicated.

test$rank <- cumsum(!duplicated(test$column1))

Make sure column1 is ordered before you use the above since both the methods are sensitive to order.

Ronak Shah
  • 377,200
  • 20
  • 156
  • 213