Here's a similar approach to Steven's, but includes dplyr::select()
to explicitly state which columns to include/ignore (like ID variables).
data %>%
mutate(sum = rowSums(dplyr::select(., a, b, c), na.rm = TRUE))
# Here's a comparable version that uses R's new native pipe.
data |>
{\(x)
mutate(
x,
sum = rowSums(dplyr::select(x, a, b, c), na.rm = TRUE)
)
}()
It has comparable performance with a realistically-sized dataset. I'm not sure why though, since no columns are actually being excluded in this skinny example.
Bigger dataset of 1M rows:
pick <- function() { sample(c(1:5, NA), 1000000, replace=T) }
data <- data.frame(a=pick(), b=pick(), c=pick())
Results:
Unit: milliseconds
expr min lq mean median uq max neval cld
steven 22.05847 22.96164 56.84822 28.85411 54.99691 174.58447 10 a
wibeasley 25.10274 26.98303 30.66911 29.30630 30.63343 49.46048 10 a
lyz 10408.89904 10548.33756 10887.51930 10720.92372 11017.56256 12250.41370 10 c
nar 1975.35941 2011.36445 2123.81705 2090.43174 2172.80501 2362.13658 10 b
akrun 31.27247 35.41943 81.33320 57.93900 63.59119 302.21059 10 a
frank 37.48265 38.72270 65.02965 41.62735 44.45775 261.79898 10 a