0

Surprised I have never come across this issue before...

What is the correct way to do operations across columns with dplyr? I would like to get the rowwise operation that is calculated in #2. However, as operations become more complex and involve multiple columns this becomes impractical. What is the appropriate way to write a more concise syntax, along the lines of #1?

library(dplyr)

#1

data.frame(a = c(1:5, 6:10),
           b = c(6:10, 1:5)) %>% 
mutate(MAX_COLUMN = max(a, b))

#2

data.frame(a = c(1:5, 6:10),
               b = c(6:10, 1:5)) %>% 
      mutate(MAX_COLUMN = ifelse(a > b, a, b))
  • Also see https://stackoverflow.com/q/49396267/5325862 and https://stackoverflow.com/q/21818181/5325862 – camille Nov 29 '20 at 01:50

1 Answers1

1

For a general solution add rowwise :

library(dplyr)

data.frame(a = c(1:5, 6:10),
           b = c(6:10, 1:5)) %>% 
  rowwise() %>%
  mutate(MAX_COLUMN = max(c_across(a:b)))

#      a     b MAX_COLUMN
#   <int> <int>      <int>
# 1     1     6          6
# 2     2     7          7
# 3     3     8          8
# 4     4     9          9
# 5     5    10         10
# 6     6     1          6
# 7     7     2          7
# 8     8     3          8
# 9     9     4          9
#10    10     5         10

If you want to take max a faster option would be pmax with do.call.

data.frame(a = c(1:5, 6:10),
           b = c(6:10, 1:5)) %>% 
  mutate(MAX_COLUMN = do.call(pmax, .))
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213