In this blog post, Paul Hiemstra shows how to sum up two columns using dplyr::mutate_
. Copy/paste-ing relevant parts:
library(lazyeval)
f = function(col1, col2, new_col_name) {
mutate_call = lazyeval::interp(~ a + b, a = as.name(col1), b = as.name(col2))
mtcars %>% mutate_(.dots = setNames(list(mutate_call), new_col_name))
}
allows one to then do:
head(f('wt', 'mpg', 'hahaaa'))
Great!
I followed up with a question (see comments) as to how one could extend this to a 100 columns, since it wasn't quite clear (to me) how one could do it without having to type all the names using the above method. Paul was kind enough to indulge me and provided this answer (thanks!):
# data
df = data.frame(matrix(1:100, 10, 10))
names(df) = LETTERS[1:10]
# answer
sum_all_rows = function(list_of_cols) {
summarise_calls = sapply(list_of_cols, function(col) {
lazyeval::interp(~col_name, col_name = as.name(col))
})
df %>% select_(.dots = summarise_calls) %>% mutate(ans1 = rowSums(.))
}
sum_all_rows(LETTERS[sample(1:10, 5)])
I'd like to improve this answer on these points:
The other columns are gone. I'd like to keep them.
It uses
rowSums()
which has to coerce the data.frame to a matrix which I'd like to avoid.Also I'm not sure if the use of
.
within non-do()
verbs is encouraged? Because.
withinmutate()
doesn't seem to adapt to just those rows when used withgroup_by()
.And most importantly, how can I do the same using
mutate_()
instead ofmutate()
?
I found this answer, which addresses point 1, but unfortunately, both dplyr
answers use rowSums()
along with mutate()
.
PS: I just read Hadley's comment under that answer. IIUC, 'reshape to long form + group by + sum + reshape to wide form' is the recommend dplyr
way for these type of operations?