How I can create a new ties.method with the R rank() function?

Question

I'm trying to order this dataframe by population and date, so I'm using the order() and rank() functions:

> df <- data.frame(idgeoville = c(5, 8, 4, 3, 4, 5, 8, 8),
                   date       = c(rep(1950, 4), rep(2000, 4)),
                   population = c(500, 450, 350, 350, 650, 500, 500, 450))
> df
   idgeoville date    population
1  5          1950     500
2  8          1950     450
3  4          1950     350
4  3          1950     350
5  4          2000     650
6  5          2000     500
7  8          2000     500
8  8          2000     450

With ties.method = "first" I have no problem, finally I'm producing this dataframe:

   idgeoville date    population  rank
1  5          1950     500        1
2  8          1950     450        2
3  4          1950     350        3
4  3          1950     350        4
5  4          2000     650        1
6  5          2000     500        2
7  8          2000     500        3
8  8          2000     450        4

But in fact, I want a dataframe with equal ranking for equal population rank, like this:

   idgeoville date    population  rank
1  5          1950     500        1
2  8          1950     450        2
3  4          1950     350        3
4  3          1950     350        3
5  4          2000     650        1
6  5          2000     500        2
7  8          2000     500        2
8  8          2000     450        3

How can I resolve this problem with R? With a custom ties.method() or another R tricks?

what about ties = min, or max, or average... they all keep the ranks of ties the same value. — John, Jul 07 '10 at 21:42
With min and x2 <- c(1,1,2,3), i have 1 1 3 4 / With max and x2 <- c(1,1,2,3), i have 2 2 3 4 I want this result for x2 rank => 1 1 2 3 — reyman64, Jul 08 '10 at 07:41
Or, use `max` and subtract the number of ties from the result? `2 2 3 4-1=1 1 2 3`. Now, the problem is to figure out the number of ties... Anyway, I just happened across this thread through Google. — Frank, May 14 '13 at 00:17

score 6 · Answer 1 · edited May 10 '16 at 00:58

6

More simple way:

pop.rank <- as.numeric(factor(population))

edited May 10 '16 at 00:58

jbaums

27,115
5
79
119

answered Jul 09 '10 at 05:34

Gregory Demin

4,596
2
20
20

This uses only `population` and ignores `date` which was requested by the OP. So, it will create an overall rank but not a separate ranking for each `date`. – Uwe Jan 04 '17 at 07:40

score 4 · Accepted Answer · edited Nov 21 '11 at 20:38

4

I believe there is no option to do it with rank; here is a custom function that will do what you want, but it may be too slow if your data is huge:

Rank<-function(d) {
    j<-unique(rev(sort(d)));
    return(sapply(d,function(dd) which(dd==j)));
}

edited Nov 21 '11 at 20:38

Peter Mortensen

30,738
21
105
131

answered Jul 07 '10 at 20:57

mbq

18,510
6
49
72

Thx a lot, it's ok ! But if another person have better and/or faster solution with R package, i take ! – reyman64 Jul 08 '10 at 08:14

score 1 · Answer 3 · answered Jul 08 '10 at 13:10

This answers a slightly different question, namely how to sort a data.frame object based on multiple columns. To do this, you could use the function sort_df in package reshape:

> library(reshape)
> sort_df(df,vars=c('date','population'))
  idgeoville date population
3          4 1950        350
4          3 1950        350
2          8 1950        450
1          5 1950        500
8          8 2000        450
6          5 2000        500
7          8 2000        500
5          4 2000        650

This doesn't answer the question. In addition, `population` is sorted in ascending order while for ranking I would expect descending order (largest first). — Uwe, Jan 04 '17 at 07:45

How I can create a new ties.method with the R rank() function?

3 Answers3

Linked