0

This task seems straightforward, but after looking at multiple answers posted in stackoverflow I still don't get the right answer, so I need help. I've been studying this post: How to rank within groups in R?

I have data from experiments in which multiple variables are collected, and I need to rank the performance of some products for each experimental condition. Here is a sample of the expected output for the ranking column.

          id           customer     location        fluid water temperature speed time product response ranking
1  103365333 Acme International   Newtown US  light fluid     5         105     8    2   AK125    25.94       1
2  103365333 Acme International   Newtown US  light fluid     5         105     8    2   AK560    25.19       2
3  103365333 Acme International   Newtown US  light fluid     5         105     8    2   PR600    24.56       3
4  103365333 Acme International   Newtown US  light fluid     5         105     8    2   PR300    23.69       4
5  103365333 Acme International   Newtown US  light fluid     5         105     8    2   XY500    23.63       5
6  103365333 Acme International   Newtown US  light fluid     5         105     8    2  XYZ123    22.75       6
7  103365333 Acme International   Newtown US  light fluid     5         105     8    2  ABC567    21.50       7
8  103365333 Acme International   Newtown US  light fluid     5         105     8    2  Z12345    21.50       8
9  103365333 Acme International   Newtown US  light fluid     5         105     8    2  W21450    21.00       9
10 103365333 Acme International   Newtown US  light fluid     5         105     8    2  W21010    20.54      10
11 103365333 Acme International   Newtown US  heavy fluid     5         105     8    2  W20001    19.06      11
12 103365333 Acme International   Newtown US  heavy fluid     5         105     8    2  W22025    15.88      12
13 155259007  New Great Company Ghosttown CA residue good    10         105     8    2   AK125    13.52       1
14 155259007  New Great Company Ghosttown CA residue good    10         120     4    2   AK560     8.75       1
15 155259007  New Great Company Ghosttown CA residue good    10         120     4    2   PR600     6.00       2
16 155259007  New Great Company Ghosttown CA residue good    10         120     4    2   PR300     1.50       3
17 155259007  New Great Company Ghosttown CA residue good    10         120     4    2   XY500     1.50       4
18 155259007  New Great Company Ghosttown CA residue good     5         105     8    2  XYZ123    14.25       1
19 155259007  New Great Company Ghosttown CA residue good     5         105     8    2  ABC567    13.25       2
20 155259007  New Great Company Ghosttown CA residue good     5         105     8    2  Z12345    12.88       3

My goal is to rank the product by the response in decreasing order, as shown in expected output. Not all product are used in all experiments, which makes it tricky.

I am trying my "standard" code pipe here:

df %>%
  arrange(id, customer, location, fluid, water, temperature, speed, time, -response) %>%
  group_by(id, customer, location, fluid, water, temperature, speed, time) %>%
  mutate(ranking = dense_rank(response))

But all I get is the overall ranking, not per group. Do you see anything wrong with my code, or there is some limitation in the number of variables to use in group_by? I've also tried the other ranking functions (which are all based on rank though). Thanks.

plperez
  • 35
  • 4
  • 3
    Please share a small copy/pasteable example with `dput()` - it is much easier for us than external downloads. – Gregor Thomas Feb 11 '18 at 16:17
  • 1
    A couple recommendations: `dense_rank` uses increasing order by default. If you want a decreasing order, use `dense_rank(-response)`. Second, I can't tell if the groupings are part of your problem, but if they are make sure you aren't accidentally using `plyr::mutate` instead of `dplyr::mutate`. If you loaded `plyr` after `dplyr`, there is a Warning that prints about old plyr functions masking the new dplyr equivalents. You can try calling `dplyr::mutate` explicitly. Look at `conflicts(, T)` to check. – Gregor Thomas Feb 11 '18 at 16:22
  • @Gregor, thank you very much! Indeed, the problem was that I was loading (inadvertently) `plyr::mutate` instead of `dplyr::mutate`. I can't see how to accept your response. – plperez Feb 12 '18 at 16:15

0 Answers0