1

Given the following data :

df <- data.frame(
  a = c(1,2,3,5),
  b = c(7,9,52,4),
  c = c(53, 11,22,1),
  d = c("something","string","another", "here")
)

Which looks as :

  a  b  c         d
1 1  7 53 something
2 2  9 11    string
3 3 52 22   another
4 5  4  1      here

I would like to create column "max" using dplyr, where max is the column of the largest row value.

So for the above I would have

  a  b  c         d  max
1 1  7 53 something   c
2 2  9 11    string   c
3 3 52 22   another   b
8 5  4  1      here   a
baxx
  • 3,956
  • 6
  • 37
  • 75

3 Answers3

2

We can use max.col to find the column index of maximum value on each row, use that to get the column name and assign ass 'max' column

df['max'] <- names(df)[1:3][max.col(df[1:3], "first")]
df
#  a  b  c         d max
#1 1  7 53 something   c
#2 2  9 11    string   c
#3 3 52 22   another   b
#4 5  4  1      here   a

With tidyverse, another approach is to reshape into 'long' format and then find the max

library(dplyr)
library(tidyr)
df %>%
   mutate(ind = row_number()) %>%
   select(-d) %>%
   pivot_longer(cols = a:c) %>%
   group_by(ind) %>%
   slice(which.max(value)) %>%
   select(-value) %>%
   pull(name) %>%
   mutate(df, max = .)

Or with pmap

library(purrr)
df %>% 
   mutate(max = pmap_chr(select(., a:c), ~ c(...) %>% 
                                   which.max %>% 
                                   names ))
akrun
  • 874,273
  • 37
  • 540
  • 662
  • 1
    thank you for the different approaches, purrr looks interesting, if a little confusing (I have never used it). It seems that the other solution using dplyr ([here](https://stackoverflow.com/a/59221774/3130747)) is a fair bit shorter, I'm just curious as to whether there were any particular reasons for the two approaches. Is one considered more "dplyr-ish" than the other? – baxx Dec 07 '19 at 01:06
2

apply(df,2,max) >> assuming your dataframe is named df

Jorge Lopez
  • 467
  • 4
  • 10
1
df %>%
    group_by(ind = row_number()) %>%
    mutate(max = c("a", "b", "c")[which.max(c(a, b, c))]) %>%
    ungroup() %>%
    select(-ind)
## A tibble: 4 x 5
#      a     b     c d         max  
#  <dbl> <dbl> <dbl> <fct>     <chr>
#1     1     7    53 something c    
#2     2     9    11 string    c    
#3     3    52    22 another   b    
#4     4     5     1 here      b 
d.b
  • 32,245
  • 6
  • 36
  • 77