Using 'mutate_' to sum a bunch of columns row-wise

Question

In this blog post, Paul Hiemstra shows how to sum up two columns using dplyr::mutate_. Copy/paste-ing relevant parts:

library(lazyeval)
f = function(col1, col2, new_col_name) {
    mutate_call = lazyeval::interp(~ a + b, a = as.name(col1), b = as.name(col2))
    mtcars %>% mutate_(.dots = setNames(list(mutate_call), new_col_name))
}

allows one to then do:

head(f('wt', 'mpg', 'hahaaa'))

Great!

I followed up with a question (see comments) as to how one could extend this to a 100 columns, since it wasn't quite clear (to me) how one could do it without having to type all the names using the above method. Paul was kind enough to indulge me and provided this answer (thanks!):

# data
df = data.frame(matrix(1:100, 10, 10))
names(df) = LETTERS[1:10]

# answer
sum_all_rows = function(list_of_cols) {
  summarise_calls = sapply(list_of_cols, function(col) {
    lazyeval::interp(~col_name, col_name = as.name(col))
  })
  df %>% select_(.dots = summarise_calls) %>% mutate(ans1 = rowSums(.))
}
sum_all_rows(LETTERS[sample(1:10, 5)])

I'd like to improve this answer on these points:

The other columns are gone. I'd like to keep them.
It uses rowSums() which has to coerce the data.frame to a matrix which I'd like to avoid.

Also I'm not sure if the use of . within non-do() verbs is encouraged? Because . within mutate() doesn't seem to adapt to just those rows when used with group_by().
And most importantly, how can I do the same using mutate_() instead of mutate()?

I found this answer, which addresses point 1, but unfortunately, both dplyr answers use rowSums() along with mutate().

PS: I just read Hadley's comment under that answer. IIUC, 'reshape to long form + group by + sum + reshape to wide form' is the recommend dplyr way for these type of operations?

No need for `library(lazyeval)` when you explicitly qualify its usage anyway. — Konrad Rudolph, Sep 28 '15 at 16:35

talat · Accepted Answer · 2015-09-28T16:09:28.720

7

Here's a different approach:

library(dplyr); library(lazyeval)
f <- function(df, list_of_cols, new_col) {
  df %>% 
    mutate_(.dots = ~Reduce(`+`, .[list_of_cols])) %>% 
    setNames(c(names(df), new_col))
}

head(f(mtcars, c("mpg", "cyl"), "x"))
#   mpg cyl disp  hp drat    wt  qsec vs am gear carb    x
#1 21.0   6  160 110 3.90 2.620 16.46  0  1    4    4 27.0
#2 21.0   6  160 110 3.90 2.875 17.02  0  1    4    4 27.0
#3 22.8   4  108  93 3.85 2.320 18.61  1  1    4    1 26.8
#4 21.4   6  258 110 3.08 3.215 19.44  1  0    3    1 27.4
#5 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2 26.7
#6 18.1   6  225 105 2.76 3.460 20.22  1  0    3    1 24.1

Regarding your points:

Other columns are kept
It doesn't use rowSums
You are specifically asking for a row-wise operation here so I'm not sure (yet) how a group_by could do any harm when using . inside mutate/mutate_
It makes use of mutate_

edited Sep 28 '15 at 16:09

answered Sep 28 '15 at 16:03

talat

68,970
21
126
157

Great! On `.` with `group_by()`, I just find it odd. As an example case, compute the row sum and divide them by max sum within group.. I guess you'd first compute the row sum and then group by and get the ratio? If so, I find it odd (of not being able to do it in one step using mutate, but using `do()`). But perhaps that's by design, no worries. Thanks. – Arun Sep 28 '15 at 17:39
Hi @docendo discimus. Great answer. Do you know if in the recent releases of dplyr there is some function that adds a column as the sum of columns matching some regular expression? – agenis Sep 20 '17 at 15:01

Using 'mutate_' to sum a bunch of columns row-wise

1 Answers1