How to get ranks with no gaps when there are ties among values?

Question

When there are ties in the original data, is there a way to create a ranking without gaps in the ranks (consecutive, integer rank values)? Suppose:

x <-  c(10, 10, 10, 5, 5, 20, 20)
rank(x)
# [1] 4.0 4.0 4.0 1.5 1.5 6.5 6.5

In this case the desired result would be:

my_rank(x)
[1] 2 2 2 1 1 3 3

I've played with all the options for ties.method option (average, max, min, random), none of which are designed to provide the desired result.

Is it possible to acheive this with the rank() function?

score 16 · Answer 1 · edited Dec 19 '18 at 16:02

16

Modified crayola solution but using match instead of merge:

x_unique <- unique(x)
x_ranks <- rank(x_unique)
x_ranks[match(x,x_unique)]

edit

or in a one-liner, as per @hadley 's comment:

match(x, sort(unique(x)))

edited Dec 19 '18 at 16:02

tjebo

21,977
7
58
94

answered Feb 06 '11 at 21:36

Marek

49,472
15
99
121

Excellent! As it turns out it seems (benchmarking with rep(x, 100000)) that this is the fastest solution. Basically: Marek > Prasad (revised) > Chase > Prasad (first) > Crayola (in terms of speed) – crayola Feb 06 '11 at 21:39
9

You could do this all in one line: `match(x, sort(unique(x)))` – hadley Feb 07 '11 at 00:23
1

@hadley As always you are right ;) I figure out this solution after posting, but timings were surprising so I hold with update. – Marek Feb 07 '11 at 11:09

Prasad Chalasani · Answer 2 · 2011-02-06T21:33:59.597

The "loopless" way to do it is to simply treat the vector as an ordered factor, then convert it to numeric:

> as.numeric( ordered( c( 10,10,10,10, 5,5,5, 10, 10 ) ) )
[1] 2 2 2 2 1 1 1 2 2
> as.numeric( ordered( c(0.5,0.56,0.76,0.23,0.33,0.4) ))
[1] 4 5 6 1 2 3
> as.numeric( ordered( c(1,1,2,3,4,5,8,8) ))
[1] 1 1 2 3 4 5 6 6

Update: Another way, that seems faster is to use findInterval and sort(unique()):

> x <- c( 10, 10, 10, 10, 5,5,5, 10, 10)
> findInterval( x, sort(unique(x)))
[1] 2 2 2 2 1 1 1 2 2

> x <- round( abs( rnorm(1000000)*10))
> system.time( z <- as.numeric( ordered( x )))
   user  system elapsed 
  0.996   0.025   1.021 
> system.time( z <- findInterval( x, sort(unique(x))))
   user  system elapsed 
  0.077   0.003   0.080

score 4 · Answer 3 · answered May 16 '17 at 21:57

4

try to think about another way

x <-  c(10,10,10,5,5,20,20)
as.numeric(as.factor(x))
[1] 2 2 2 1 1 3 3

answered May 16 '17 at 21:57

BENY

317,841
20
164
234

Sacha Epskamp · Accepted Answer · 2011-02-06T20:31:51.480

4

I can think of a quick function to do this. It's not optimal with a for loop but it works:)

x=c(1,1,2,3,4,5,8,8)

foo <- function(x){
    su=sort(unique(x))
    for (i in 1:length(su)) x[x==su[i]] = i
    return(x)
}

foo(x)

[1] 1 1 2 3 4 5 6 6

edited Feb 06 '11 at 20:31

answered Feb 06 '11 at 20:13

Sacha Epskamp

46,463
20
113
131

This works wonderfully. Thank you. Also of note, it's very simple to change the direction of the sort if you need a decreasing rank! Cheers! – Brandon Bertelsen Feb 06 '11 at 20:52

score 3 · Answer 5 · answered Oct 02 '18 at 13:00

3

If you don't mind leaving base-R:

library(data.table)
frank(x, ties.method = "dense")
[1] 2 2 2 1 1 3 3

data:

x <-  c(10, 10, 10, 5, 5, 20, 20)

answered Oct 02 '18 at 13:00

s_baldur

29,441
4
36
69

score 2 · Answer 6 · answered Feb 06 '11 at 21:10

Another function that does this, but it seems inefficient. There is no for loop, but I doubt it is more efficient than Sacha's suggestion!

x=c(1,1,2,3,4,5,8,8)
fancy.rank <- function(x) {
    x.unique <- unique(x)
    d1 <- data.frame(x=x)
    d2 <- data.frame(x=x.unique, rank(x.unique))
    merge(d1, d2, by="x")[,2]
}

fancy.rank(x)

[1] 1 1 2 3 4 5 6 6

score 2 · Answer 7 · answered Jun 26 '19 at 12:35

2

For those fond of using dplyr:

dense_rank(x)

[1] 2 2 2 1 1 3 3

answered Jun 26 '19 at 12:35

tmfmnk

38,881
4
47
67

This function `dense_rank` itself is defined as `match(x, sort(unique(x)))`—exactly what @hadley proposed. – Andreï V. Kostyrka Jul 20 '22 at 12:32

score -1 · Answer 8 · answered Feb 06 '11 at 20:47

-1

What about sort()?

x <- c(1,1,2,3,4,5)
sort(x)

> sort(x) 
[1] 1 1 2 3 4 5

answered Feb 06 '11 at 20:47

Chase

67,710
18
144
161

This is correct by coincidence. The numbers aren't as clean as in the example. ie. try: x <- c(0.5,0.56,0.76,0.23,0.33,0.4) – Brandon Bertelsen Feb 06 '11 at 20:52
@Brandon - Maybe I'm not comprehending some restriction of your need here...probably this part "I can't have two elements at either end of the range being greater than 1 or max(range)." What is the desired output from your example in the comment above? If that is more representative than what is in your question, maybe you could edit the question to reflect that? – Chase Feb 06 '11 at 21:01
apologies if it wasn't clear. The question is about ranking data and what you've done here provides a sort of the data that just happens to also be the same sequence of numbers that would come from the solution of ranking them that I'm trying to get at. The goal is to get the ranks, not just the sorting. – Brandon Bertelsen Feb 06 '11 at 21:20
Also, w.r.t. the comment about greater than 1 or max(range). If you look in my question, the example I've provided for rank(x) returns 1.5,1.5... basically, I wanted them to be 1,1,... (not greater than 1) – Brandon Bertelsen Feb 06 '11 at 21:29

How to get ranks with no gaps when there are ties among values?

8 Answers8

Linked

Related