0

I have to rank a data set ordered by several variables in this data set and grouped by another variable of it. When I use ranking methods on a data.table, the ranking values are decimals. I'd need them to be integer numbers without decimal part.

Bellow, I´m providing a summary of what I need. I´m copying somebody else's example from another question in this website (and also related to ranking methods). I found the answer to that question useful, but it still doesn't provide the way to make the ranking outcome an integer number without decimals. That's why I'm copying it here and taking it as the starting point for this question (as it is not allowed to ask different questions under an answer).

I need to rank based upon several variables, grouped by one (or several variables), and then get an integer ranking without decimals.

Here's this other person's example:

He creates the data table:

library(data.table)

t1 <- data.table (id = c('11', '11', '11', '22','22',
                         '88', '99','44','44', '55'),
                          date = as.Date(c("01-01-2016", 
                                "01-02-2016", 
                                "01-02-2016",
                                "02-01-2016", 
                                "02-02-2016"),
                              format = "%m-%d-%Y"))


setkey(dt1, date)
setkey(dt1, id)
dt1
    id       date

1: 11 2016-01-01

2: 11 2016-01-02

3: 11 2016-01-02

4: 22 2016-02-01

5: 22 2016-02-02

6: 44 2016-01-02

7: 44 2016-02-01

8: 55 2016-02-02

9: 88 2016-01-01

10: 99 2016-01-02

And here he ranks based on the variable date and grouped by id:

dt1[, rank := frank(date), by = list(id)]
dt1

    id       date  rank
1: 11 2016-01-01   1.0
2: 11 2016-01-02   2.5
3: 11 2016-01-02   2.5
4: 22 2016-02-01   1.0
5: 22 2016-02-02   2.0
6: 44 2016-01-02   1.0 
7: 44 2016-02-01   2.0
8: 55 2016-02-02   1.0
9: 88 2016-01-01   1.0
10: 99 2016-01-02   1.0

Results should only be like this:

    id       date  rank
 1: 11 2016-01-01   1
 2: 11 2016-01-02   2
 3: 11 2016-01-02   2
 4: 22 2016-02-01   1
 5: 22 2016-02-02   2
 6: 44 2016-01-02   1
 7: 44 2016-02-01   2
 8: 55 2016-02-02   1
 9: 88 2016-01-01   1
10: 99 2016-01-02   1
Cettt
  • 11,460
  • 7
  • 35
  • 58
Paco
  • 65
  • 9

1 Answers1

6

you can specify how you want to handle ties in frank. There is an argument ties.method which defaults to average which results in decimal ranks. See ?frank for details.

You could e.g. set

dt1[, rank := frank(date, ties.method = "min"), by = list(id)]

to get integer ranks.

Cettt
  • 11,460
  • 7
  • 35
  • 58
  • Ah! Ok... I knew I had to add ties.method, but I was just adding it outside frank, like rank := frank(date), by = list(id), ties.method = "min"]. So It was obviously not working... Thank you very much. It was very helpful... But now, adding what you say I get en error message: "Type of RHS ('integer') must match LHS ('double'). To check and coerce would impact performance too much for the fastest cases. Either change the type of the target column, or coerce the RHS of := yourself (e.g. by using 1L instead of 1)" – Paco Mar 26 '19 at 17:05
  • 1
    @Paco Clear the column (where you have the decimal values currently) `dt1[, rank := NULL]` and try again. (You can't change the type of a column in a by= expression.) – Frank Mar 26 '19 at 23:34