0

I have such a data frame:

dat <- data.frame(var1 = rnorm(10), var2 = rnorm(10), var3 = rnorm(10), var4 = rnorm(10))
> dat
         var1        var2        var3       var4
1  -1.3784414  1.06816022  1.46578217 -0.4141153
2  -0.3272332 -0.69470574  0.02220395 -0.5502878
3   0.2559891 -0.06964848 -0.34745180  0.6399705
4   0.6029044  1.23680560 -0.72392358 -0.1990832
5   1.3097174 -0.58028595 -0.01487186 -0.8765290
6  -1.2356668  0.41330063 -1.00375989 -1.1974204
7  -0.4126320  3.83320678 -1.42059022 -0.6747575
8   1.7339653  0.58610348  0.40200428  1.4582103
9   1.2994859  1.65355306  0.75985071  0.6455882
10 -0.2353356  2.04468739 -0.11521602  0.3251901

The aim is to create a new column with the name of the column that contains the maximum value in each row within columns var2, var3 and var4.

Using the following command does not result in the correct output:

library(dplyr)
dat %>%
  rowwise() %>%
  mutate(var.max = colnames(.)[which.max(c_across(var2:var4))])

# A tibble: 10 x 5
# Rowwise: 
     var1    var2    var3   var4 var.max
    <dbl>   <dbl>   <dbl>  <dbl> <chr>  
 1 -1.38   1.07    1.47   -0.414 var2   
 2 -0.327 -0.695   0.0222 -0.550 var2   
 3  0.256 -0.0696 -0.347   0.640 var3   
 4  0.603  1.24   -0.724  -0.199 var1   
 5  1.31  -0.580  -0.0149 -0.877 var2   
 6 -1.24   0.413  -1.00   -1.20  var1   
 7 -0.413  3.83   -1.42   -0.675 var1   
 8  1.73   0.586   0.402   1.46  var3   
 9  1.30   1.65    0.760   0.646 var1   
10 -0.235  2.04   -0.115   0.325 var1  

But if the column var1 is excluded from the data it works:

dat %>%
  select(-var1) %>%
  rowwise() %>%
  mutate(var.max = colnames(.)[which.max(c_across(var2:var4))])

# A tibble: 10 x 4
# Rowwise: 
      var2    var3   var4 var.max
     <dbl>   <dbl>  <dbl> <chr>  
 1  1.07    1.47   -0.414 var3   
 2 -0.695   0.0222 -0.550 var3   
 3 -0.0696 -0.347   0.640 var4   
 4  1.24   -0.724  -0.199 var2   
 5 -0.580  -0.0149 -0.877 var3   
 6  0.413  -1.00   -1.20  var2   
 7  3.83   -1.42   -0.675 var2   
 8  0.586   0.402   1.46  var4   
 9  1.65    0.760   0.646 var2   
10  2.04   -0.115   0.325 var2  

.. just like when var1 is at the last position:

dat %>%
  select(var2, var3, var4, var1) %>%
  rowwise() %>%
  mutate(var.max = colnames(.)[which.max(c_across(var2:var4))])

# A tibble: 10 x 5
# Rowwise: 
      var2    var3   var4   var1 var.max
     <dbl>   <dbl>  <dbl>  <dbl> <chr>  
 1  1.07    1.47   -0.414 -1.38  var3   
 2 -0.695   0.0222 -0.550 -0.327 var3   
 3 -0.0696 -0.347   0.640  0.256 var4   
 4  1.24   -0.724  -0.199  0.603 var2   
 5 -0.580  -0.0149 -0.877  1.31  var3   
 6  0.413  -1.00   -1.20  -1.24  var2   
 7  3.83   -1.42   -0.675 -0.413 var2   
 8  0.586   0.402   1.46   1.73  var4   
 9  1.65    0.760   0.646  1.30  var2   
10  2.04   -0.115   0.325 -0.235 var2 

What am I missing here?

erc
  • 10,113
  • 11
  • 57
  • 88
  • 2
    Try `names(dat)[max.col(dat)]` – Sotos Feb 23 '23 at 12:15
  • Does this answer your question? [Use dplyr to add a new column of based on max row value?](https://stackoverflow.com/questions/52571879/use-dplyr-to-add-a-new-column-of-based-on-max-row-value) – jwalton Feb 23 '23 at 12:25
  • Doesn't work (or I don't use it correctly?): Error in `mutate()`: ! Problem while computing `..1 = names(dat)[max.col(dat)]`. x `..1` must be size 1, not 10. i Did you mean: `..1 = list(names(dat)[max.col(dat)])` ? i The error occurred in row 1. – erc Feb 23 '23 at 12:28
  • An easy fix for this data you shared is to add +1 at the `which.max()` indexing, i.e. `mutate(var.max = colnames(.)[which.max(c_across(var2:var4))] + 1)` – Sotos Feb 23 '23 at 12:29
  • @Sotos Error in `mutate()`: ! Problem while computing `var.max = colnames(.)[which.max(c_across(var2:var4))] + 1`. i The error occurred in row 1. Caused by error in `colnames(.)[which.max(c_across(var2:var4))] + 1`: ! non-numeric argument to binary operator – erc Feb 23 '23 at 12:32
  • The +1 inside the indexing...`dat %>% rowwise() %>% mutate(max_col = names(dat)[which.max(c_across(var2:var4)) + 1]` – Sotos Feb 23 '23 at 12:33

2 Answers2

3

To continue your logic and since you are only removing the firsrt column, just add 1 to which.max(), i.e.

library(dplyr)

dat %>% 
 rowwise() %>% 
 mutate(max_col = names(dat)[which.max(c_across(var2:var4)) + 1])

# A tibble: 10 × 5
# Rowwise: 
       var1     var2     var3     var4 max_col
      <dbl>    <dbl>    <dbl>    <dbl> <chr>  
 1 -1.09     0.768    0.251   -2.67    var2   
 2 -0.822   -1.37     0.901    1.83    var4   
 3  0.0280  -0.00555 -0.0709   0.729   var4   
 4  1.45    -0.132   -2.47     1.45    var4   
 5  0.506   -1.31    -2.75    -0.264   var4   
 6 -0.00538  1.31    -0.368    0.00679 var2   
 7 -0.166   -0.976   -1.42     1.50    var4   
 8 -0.377   -0.101    0.135    0.784   var4   
 9  0.535    0.438    0.0597   0.924   var4   
10  0.281   -0.481   -0.00177 -0.601   var3   

If you want to do it by specifying which columns to consider then,

my_cols <- c('var2', 'var3', 'var4')

dat %>%
     rowwise() %>%
     mutate(max_col = names(dat)[which.max(c_across(names(dat)[names(dat) %in% my_cols])) + (ncol(dat) - length(my_cols))])
Sotos
  • 51,121
  • 6
  • 32
  • 66
  • OK, this works but seems a bit dodgy to me. Why is it not working correctly in the first place? Would be great to have a solution that does not depend on the order of the columns. – erc Feb 23 '23 at 12:38
  • @erc: `which.max()` returns the positon inside the vector of four elements: `c_across(var2:var4)`. When `var2` is the maximum it will return `1`. We have to add `+1` when we subset the `colnames()` of `dat` otherwise it will show `var1` since it is the first position. – TimTeaFan Feb 23 '23 at 12:41
  • @TimTeaFan So, is there an option to extract the column name directly without indexing? – erc Feb 23 '23 at 12:47
  • @erc: we can write a custom function to avoid adding the number of colums we left out to our index from `which.max()`, see my answer below. – TimTeaFan Feb 23 '23 at 13:10
1

If you want to avoid adding the number of columns which are left out (in the above case +1) then we can write a custom function max_col_name() using across() or pick():

library(dplyr)

max_col_name <- function(...) {
  row_dat <- across(c(...)) # if dplyr v >= v 1.1. use `pick()` instead of `across()`
  names(row_dat)[which.max(row_dat)]
}

dat %>% 
  rowwise() %>% 
  mutate(max_col = max_col_name(var2:var4))

#> # A tibble: 10 x 5
#> # Rowwise: 
#>       var1   var2   var3    var4 max_col
#>      <dbl>  <dbl>  <dbl>   <dbl> <chr>  
#>  1 -0.560   1.22  -1.07   0.426  var2   
#>  2 -0.230   0.360 -0.218 -0.295  var2   
#>  3  1.56    0.401 -1.03   0.895  var4   
#>  4  0.0705  0.111 -0.729  0.878  var4   
#>  5  0.129  -0.556 -0.625  0.822  var4   
#>  6  1.72    1.79  -1.69   0.689  var2   
#>  7  0.461   0.498  0.838  0.554  var3   
#>  8 -1.27   -1.97   0.153 -0.0619 var3   
#>  9 -0.687   0.701 -1.14  -0.306  var2   
#> 10 -0.446  -0.473  1.25  -0.380  var3

Data from the OP

set.seed(123)
dat <- data.frame(var1 = rnorm(10), var2 = rnorm(10), var3 = rnorm(10), var4 = rnorm(10))

Created on 2023-02-23 by the reprex package (v2.0.1)

TimTeaFan
  • 17,549
  • 4
  • 18
  • 39