14

When there are ties in the original data, is there a way to create a ranking without gaps in the ranks (consecutive, integer rank values)? Suppose:

x <-  c(10, 10, 10, 5, 5, 20, 20)
rank(x)
# [1] 4.0 4.0 4.0 1.5 1.5 6.5 6.5

In this case the desired result would be:

my_rank(x)
[1] 2 2 2 1 1 3 3

I've played with all the options for ties.method option (average, max, min, random), none of which are designed to provide the desired result.

Is it possible to acheive this with the rank() function?

Henrik
  • 65,555
  • 14
  • 143
  • 159
Brandon Bertelsen
  • 43,807
  • 34
  • 160
  • 255

8 Answers8

16

Modified crayola solution but using match instead of merge:

x_unique <- unique(x)
x_ranks <- rank(x_unique)
x_ranks[match(x,x_unique)]

edit

or in a one-liner, as per @hadley 's comment:

match(x, sort(unique(x)))
tjebo
  • 21,977
  • 7
  • 58
  • 94
Marek
  • 49,472
  • 15
  • 99
  • 121
  • Excellent! As it turns out it seems (benchmarking with rep(x, 100000)) that this is the fastest solution. Basically: Marek > Prasad (revised) > Chase > Prasad (first) > Crayola (in terms of speed) – crayola Feb 06 '11 at 21:39
  • 9
    You could do this all in one line: `match(x, sort(unique(x)))` – hadley Feb 07 '11 at 00:23
  • 1
    @hadley As always you are right ;) I figure out this solution after posting, but timings were surprising so I hold with update. – Marek Feb 07 '11 at 11:09
9

The "loopless" way to do it is to simply treat the vector as an ordered factor, then convert it to numeric:

> as.numeric( ordered( c( 10,10,10,10, 5,5,5, 10, 10 ) ) )
[1] 2 2 2 2 1 1 1 2 2
> as.numeric( ordered( c(0.5,0.56,0.76,0.23,0.33,0.4) ))
[1] 4 5 6 1 2 3
> as.numeric( ordered( c(1,1,2,3,4,5,8,8) ))
[1] 1 1 2 3 4 5 6 6

Update: Another way, that seems faster is to use findInterval and sort(unique()):

> x <- c( 10, 10, 10, 10, 5,5,5, 10, 10)
> findInterval( x, sort(unique(x)))
[1] 2 2 2 2 1 1 1 2 2

> x <- round( abs( rnorm(1000000)*10))
> system.time( z <- as.numeric( ordered( x )))
   user  system elapsed 
  0.996   0.025   1.021 
> system.time( z <- findInterval( x, sort(unique(x))))
   user  system elapsed 
  0.077   0.003   0.080 
Prasad Chalasani
  • 19,912
  • 7
  • 51
  • 73
4

try to think about another way

x <-  c(10,10,10,5,5,20,20)
as.numeric(as.factor(x))
[1] 2 2 2 1 1 3 3
BENY
  • 317,841
  • 20
  • 164
  • 234
4

I can think of a quick function to do this. It's not optimal with a for loop but it works:)

x=c(1,1,2,3,4,5,8,8)

foo <- function(x){
    su=sort(unique(x))
    for (i in 1:length(su)) x[x==su[i]] = i
    return(x)
}

foo(x)

[1] 1 1 2 3 4 5 6 6
Sacha Epskamp
  • 46,463
  • 20
  • 113
  • 131
  • This works wonderfully. Thank you. Also of note, it's very simple to change the direction of the sort if you need a decreasing rank! Cheers! – Brandon Bertelsen Feb 06 '11 at 20:52
3

If you don't mind leaving base-R:

library(data.table)
frank(x, ties.method = "dense")
[1] 2 2 2 1 1 3 3

data:

x <-  c(10, 10, 10, 5, 5, 20, 20)
s_baldur
  • 29,441
  • 4
  • 36
  • 69
2

Another function that does this, but it seems inefficient. There is no for loop, but I doubt it is more efficient than Sacha's suggestion!

x=c(1,1,2,3,4,5,8,8)
fancy.rank <- function(x) {
    x.unique <- unique(x)
    d1 <- data.frame(x=x)
    d2 <- data.frame(x=x.unique, rank(x.unique))
    merge(d1, d2, by="x")[,2]
}

fancy.rank(x)

[1] 1 1 2 3 4 5 6 6
crayola
  • 1,668
  • 13
  • 16
2

For those fond of using dplyr:

dense_rank(x)

[1] 2 2 2 1 1 3 3
tmfmnk
  • 38,881
  • 4
  • 47
  • 67
-1

What about sort()?

x <- c(1,1,2,3,4,5)
sort(x)

> sort(x) 
[1] 1 1 2 3 4 5
Chase
  • 67,710
  • 18
  • 144
  • 161
  • This is correct by coincidence. The numbers aren't as clean as in the example. ie. try: x <- c(0.5,0.56,0.76,0.23,0.33,0.4) – Brandon Bertelsen Feb 06 '11 at 20:52
  • @Brandon - Maybe I'm not comprehending some restriction of your need here...probably this part "I can't have two elements at either end of the range being greater than 1 or max(range)." What is the desired output from your example in the comment above? If that is more representative than what is in your question, maybe you could edit the question to reflect that? – Chase Feb 06 '11 at 21:01
  • apologies if it wasn't clear. The question is about ranking data and what you've done here provides a sort of the data that just happens to also be the same sequence of numbers that would come from the solution of ranking them that I'm trying to get at. The goal is to get the ranks, not just the sorting. – Brandon Bertelsen Feb 06 '11 at 21:20
  • Also, w.r.t. the comment about greater than 1 or max(range). If you look in my question, the example I've provided for rank(x) returns 1.5,1.5... basically, I wanted them to be 1,1,... (not greater than 1) – Brandon Bertelsen Feb 06 '11 at 21:29