0

I have the following input dataframe and would like to first group it by Gene and then arrange by descending Expression. Once I've done that, I'd like to add a Rank column that ranks each row per Gene according to the Expression value - so rows with higher Expression per gene get ranked higher.

I've already done the group by and arrange by part (below), but I'm struggling with how to do the ranking.

dat_sorted <- dat %>% select(Gene, Expression, Sample) %>%
    group_by(Gene) %>% 
    arrange(Gene, desc(Expression))


**INPUT (dat)**

Gene                Expression      Sample
ENSG00000000027     2.79336700      HSB431
ENSG00000000938     0.83478860      HSB414
ENSG00000000003     2.40009100      HSB618
ENSG00000000938     1.75148448      HSB671
ENSG00000000938     1.52182467      HSB670
ENSG00000000938     0.62174432      HSB459
ENSG00000000003     2.81561500      HSB671



**EXPECTED OUTPUT**

Gene                Expression      Sample      Rank
ENSG00000000003     2.81561500      HSB671      1
ENSG00000000003     2.79336700      HSB431      2
ENSG00000000027     2.79336700      HSB431      1
ENSG00000000938     1.75148448      HSB671      1
ENSG00000000938     1.52182467      HSB670      2
ENSG00000000938     0.83478860      HSB414      3
ENSG00000000938     0.62174432      HSB459      4

UPDATE

When trying:

dat %>% 
  group_by(Gene) %>%
  mutate(Rank = dense_rank(Expression)) %>% 
  arrange(Gene, Expression, Rank)

I get:

Gene                Sample   Expression     Rank
ENSG00000000003     HSB626   3.52200400     31107
ENSG00000000938     HSB152  -1.60663921     1585
ENSG00000000938     HSB425  -0.40209856     3536
ENSG00000000938     HSB627  -1.09598712     2244
ENSG00000000938     HSB645  -0.82846242     2666
ENSG00000000971     HSB154   4.61434903     53421
ENSG00000000971     HSB154   4.61434903     53421
ENSG00000000971     HSB154   4.61434903     53421
ENSG00000000971     HSB195   2.45561878     18041
ENSG00000000971     HSB222   5.54389646     79697
claudiadast
  • 591
  • 3
  • 11
  • 33
  • 2
    `... %>% mutate(Rank = 1:n())`, this relies on your `arrange` ordering. Or using the `rank` function (which ranks low to high, so we need a negative) `... %>% mutate(Rank = rank(-Expression))`, which just uses the `Expression` values so it doesn't depend on the ordering. Also `rank` has several options for dealing with ties. – Gregor Thomas Nov 28 '18 at 17:07
  • @Gregor: Do you mind providing a complete example? I've tried variations of what you suggested but am still not having any luck. – claudiadast Nov 28 '18 at 17:32
  • 2
    From your edit it looks like you've loaded `plyr` after `dplyr` and ignored the warnings. [See this R-FAQ](https://stackoverflow.com/q/26106146/903061). Run `detach(package:plyr)` or specify `dplyr::mutate` and it should work. – Gregor Thomas Nov 28 '18 at 17:34

2 Answers2

0

We can use dense_rank

dat %>% 
  group_by(Gene) %>%
  mutate(Rank = dense_rank(Expression)) %>% 
  arrange(Gene, Expression, Rank)
akrun
  • 874,273
  • 37
  • 540
  • 662
0

The following worked:

dplyr::mutate

dat_rank <- dat %>% 
        group_by(Gene) %>%
        arrange(Gene, desc(Expression)) %>% 
        mutate(Rank = 1:n())
claudiadast
  • 591
  • 3
  • 11
  • 33