2

I have a data set where 5 varieties (var) and 3 variables (x,y,z) are available. I need to rank these varieties for 3 variables. When there is tie in rank it shows gap before starting the following rank. I cannot get the consecutive rank. Here is my data

 x<-c(3,3,4,5,5)
 y<-c(5,6,4,4,5)
 z<-c(2,3,4,3,5)
 df<-cbind(x,y,z)
 rownames(df) <- paste0("G", 1:nrow(df))
 df <- data.frame(var = row.names(df), df)

I tried the following code for my result

res <- sapply(df, rank,ties.method='min')
res

     var x y z
[1,]   1 1 3 1
[2,]   2 1 5 2
[3,]   3 3 1 4
[4,]   4 4 1 2
[5,]   5 4 3 5

I got x variable with rank 1 1 3 4 4 instead of 1 1 2 3 3. For y and z the same thing was found.

My desired result is

 >res
     var x y z
[1,]   1 1 2 1
[2,]   2 1 3 2
[3,]   3 2 1 3
[4,]   4 3 1 2
[5,]   5 3 2 4

I will be grateful if anyone helps me.

Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
Rokib
  • 97
  • 7
  • 1
    Dupe-oids: [Create a ranking variable with dplyr?](https://stackoverflow.com/questions/26106408/create-a-ranking-variable-with-dplyr), e.g. [this](https://stackoverflow.com/a/42377830/1851712). More general post: [How to emulate SQL “partition by” in R?](https://stackoverflow.com/questions/11446254/how-to-emulate-sql-partition-by-in-r) (which includes the `as.integer(as.factor(x))`method). And a [`data.table` version on ranking multiple columns](https://stackoverflow.com/a/28141041/1851712). – Henrik Jun 26 '19 at 10:25

3 Answers3

2

One dplyr possibility could be:

df %>%
 mutate_at(2:4, list(~ dense_rank(.)))

  var x y z
1  G1 1 2 1
2  G2 1 3 2
3  G3 2 1 3
4  G4 3 1 2
5  G5 3 2 4

Or a base R possibility:

df[2:4] <- lapply(df[2:4], function(x) match(x, sort(unique(x))))
tmfmnk
  • 38,881
  • 4
  • 47
  • 67
2

Well, an easy way would be to convert to factor and then integer

df[] <- lapply(df, function(x) as.integer(factor(x)))
df
#   var x y z
#G1   1 1 2 1
#G2   2 1 3 2
#G3   3 2 1 3
#G4   4 3 1 2
#G5   5 3 2 4
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
1

We can use data.table

library(data.table)
setDT(df)[, (2:4) := lapply(.SD, dense_rank), .SDcols = 2:4]
df
#   var x y z
#1:  G1 1 2 1
#2:  G2 1 3 2
#3:  G3 2 1 3
#4:  G4 3 1 2
#5:  G5 3 2 4
akrun
  • 874,273
  • 37
  • 540
  • 662