1

Sample of df:

df <- tibble(name = LETTERS[1:10],
              x = rnorm(10, mean = 10), 
              y = rnorm(10, 10), 
              z = rnorm(10, 10))

I would like to mutate ranked columns for x, then the sums of cols x and y, then x and y and z, where the bigger numbers are ranked 1, then the smallest numbers 10.

Starting with x, I could do something like:

df %<>% mutate(rank_01 = min_rank(-x))

Which computes the ranked column for x, but then I'm not sure what the best process would be to compute the latter columns. I'm guessing taking advantage of vectorisation somehow, but my programming skills are limited here.

In my real df, the total number of cols I would like to do this with is >50, so an automated process is ideal!

Expected output:

# A tibble: 10 x 7
   name      x rank_01     y rank_02     z rank_03
 * <chr> <dbl>   <dbl> <dbl>   <dbl> <dbl>   <dbl>
 1 A      9.37       8 11.5        4 10.9        2
 2 B     10.2        6 10.4        5 10.8        3
 3 C      9.16      10  9.38      10 10.1        9
 4 D     11.6        1  7.79       8  8.01      10
 5 E     10.3        5 11.1        2 10.6        1
 6 F      9.18       9  9.96       9  9.94       8
 7 G     10.5        4  9.98       6  9.84       6
 8 H     10.7        2 10.9        1  8.53       7
 9 I     10.6        3 10.8        3  9.52       4
10 J      9.69       7 10.6        7 10.4        5
Nautica
  • 2,004
  • 1
  • 12
  • 35

3 Answers3

2
cbind(df, apply(-apply(df[, -1], 1, cumsum), 1, rank))
#    name         x         y         z  x  y  z
# 1     A 10.049312 10.424365  9.286644  5  4  5
# 2     B 10.010068 10.996667  8.754025  6  1  4
# 3     C  9.813097  9.493180 10.651993  9  7  3
# 4     D 10.702742  9.657496  9.838946  3  5  2
# 5     E  9.936206  9.047051  8.938002  7 10 10
# 6     F  9.833105  9.205973 10.627177  8  9  6
# 7     G 11.310733  9.262942  8.931759  2  3  7
# 8     H 11.316306  8.576866 12.390953  1  6  1
# 9     I  9.044812 10.251189  9.606649 10  8  9
# 10    J 10.495743 10.174724  8.458670  4  2  8

You may also want to set the column names to something like rank_x, _rank_xy, etc. See Cumulatively paste (concatenate) values grouped by another variable for that. E.g.,

paste0("rank_", Reduce(paste0, names(df)[-1], accumulate = TRUE))
# [1] "rank_x"   "rank_xy"  "rank_xyz"
Julius Vainora
  • 47,421
  • 9
  • 90
  • 102
  • Thanks, can I ask what the purposes of the 1 are in the two apply calls? For learning purposes. My guess is that it transposes the selected vectors (`x`,`y`,`z`) in the first brackets to allow the `cumsum` to work long-wise on the vectors, then transposes back to it's original form in the second call. – Nautica Oct 27 '18 at 20:51
  • 1
    As it says in `?apply`, the second argument is *a vector giving the subscripts which the function will be applied over*. So, the inner `apply` goes across columns and uses `cumsum`. However, next we want to go over rows and rank the terms, so we want to go over the other dimension and need another `apply`. It happens that the first `apply` returns a transposed result, so actually we again go over columns... It may be confusing, so I suggest to run the two `apply` in two steps to see what happens. – Julius Vainora Oct 27 '18 at 20:55
1

A different approach using tidyverse and reshape2:

df %>% 
  gather(var, val, -name) %>% 
  arrange(name) %>% 
  group_by(name) %>% 
  mutate(temp = cumsum(val)) %>% 
  ungroup() %>%
  dcast(name~var, value.var = "temp") %>%
  mutate_at(vars(-name), funs(rank = dense_rank(desc(.)))) %>%
  select(matches("(_rank)|(name)")) %>%
  left_join(df, by = c("name" = "name"))

   name x_rank y_rank z_rank         x         y         z
1     A      1      3      9 11.668095  9.645292  6.977697
2     B      3      1      1 11.085743 12.395033  9.130904
3     C      4      4      3 10.557528 10.551010  9.586108
4     D     10      8      2  8.363167 11.248786 11.989218
5     E      6      7      6  9.728462 10.049470  9.921010
6     F      2      5      7 11.091799  9.544451  8.516171
7     G      7      6      4  9.686247 10.657889  9.713129
8     H      8     10     10  9.317976  8.514533  9.098976
9     I      5      2      5 10.052081 11.469185  8.425983
10    J      9      9      8  9.290704  9.778239  9.331685

Or if you want column names that indicate cumulation:

df %>% 
  gather(var, val, -name) %>% 
  arrange(name) %>% 
  group_by(name) %>% 
  mutate(temp = cumsum(val),
         var = paste0(Reduce(paste0, var, accumulate = TRUE))) %>% 
  ungroup() %>%
  dcast(name~var, value.var = "temp") %>%
  mutate_at(vars(-name), funs(rank = dense_rank(desc(.)))) %>%
  select(matches("(_rank)|(name)")) %>%
  left_join(df, by = c("name" = "name"))

   name x_rank xy_rank xyz_rank         x         y         z
1     A      1       3        9 11.668095  9.645292  6.977697
2     B      3       1        1 11.085743 12.395033  9.130904
3     C      4       4        3 10.557528 10.551010  9.586108
4     D     10       8        2  8.363167 11.248786 11.989218
5     E      6       7        6  9.728462 10.049470  9.921010
6     F      2       5        7 11.091799  9.544451  8.516171
7     G      7       6        4  9.686247 10.657889  9.713129
8     H      8      10       10  9.317976  8.514533  9.098976
9     I      5       2        5 10.052081 11.469185  8.425983
10    J      9       9        8  9.290704  9.778239  9.331685
tmfmnk
  • 38,881
  • 4
  • 47
  • 67
1

Another approach with tidyverse

library(tidyverse)
pmap(df[,-1], ~ cumsum(c(...)) %>%
          as.tibble) %>% 
          bind_cols %>% 
          pmap(., ~ -c(...) %>% 
                rank%>% 
                as.tibble) %>%
     bind_cols(df, .) %>% 
     rename_at(vars(matches("value")), ~ paste0("rank", sprintf("_%02d", 1:3)))
akrun
  • 874,273
  • 37
  • 540
  • 662