1

In R, I would like to manipulate (say multiply) data.frame columns with appropriately named values stored in a vector (or data.frame, if that's easier).

Let's say, I want to first summarise the variables disp, hp, and wt from the mtcars dataset.

vars <- c("disp", "hp", "wt")
mtcars %>% 
  summarise_at(vars, funs(sum(.))

(throw a group_by(cyl) into the mix, or use mutate_at if you'd like to have more rows)

Now I'd like to multiply each of the resulting columns with a particular value, given by

multiplier <- c("disp" = 2, "hp" = 3, "wt" = 4)

Is it possible to refer to these within the summarise_at function?

The result should look like this (and I don't want to have to refer to the variable names directly while getting there):

disp    hp    wt
14766.2 14082 411.808

UPDATE:

Maybe my MWE was too minimal. Let's say I want to do the same operation with a data.frame grouped by cyl

mtcars %>% 
  group_by(cyl) %>% 
  summarise_at(vars, sum) 

The result should thus be:

    cyl   disp   hp      wt
1     4 2313.0 2727 100.572
2     6 2566.4 2568  87.280
3     8 9886.8 8787 223.956

UPDATE 2:

Maybe I was not explicit enough here either, but the columns in the data.frame should be multiplied by the respective values in the vector (and only those columns mentioned in the vector), so e.g. disp should be multiplied by 2, hp by 3 and wt by 4, all other variables (e.g. cyl) should remain untouched by the multiplication.

dpprdan
  • 1,727
  • 11
  • 24

2 Answers2

1

We could also do this with map function from purrr

library(purrr)
mtcars %>%
    summarise_at(vars, sum) %>%
    map2_df(multiplier, `*`)
#      disp    hp      wt
#     <dbl> <dbl>   <dbl>
# 1 14766.2 14082 411.808

For the updated question

d1 <- mtcars %>% 
         group_by(cyl) %>% 
         summarise_at(vars, sum) 
d1 %>% 
   select(one_of(vars)) %>% 
   map2_df(multiplier[vars], ~ .x * .y) %>%
   bind_cols(d1 %>% select(-one_of(vars)), .) 
#    cyl   disp    hp      wt
#  <dbl>  <dbl> <dbl>   <dbl>
#1     4 2313.0  2727 100.572
#2     6 2566.4  2568  87.280
#3     8 9886.8  8787 223.956

Or we can use gather/spread

library(tidyr)
mtcars %>% 
    group_by(cyl) %>% 
    summarise_at(vars, sum) %>% 
    gather(var, val, -cyl) %>% 
    mutate(val = val*multiplier[match(var, names(multiplier))]) %>% 
    spread(var, val)
#     cyl   disp    hp      wt
#   <dbl>  <dbl> <dbl>   <dbl>
#1     4 2313.0  2727 100.572
#2     6 2566.4  2568  87.280
#3     8 9886.8  8787 223.956
dpprdan
  • 1,727
  • 11
  • 24
akrun
  • 874,273
  • 37
  • 540
  • 662
  • I might be wrong, but this does not work with a `group_by(cyl)`, does it? I get "Error: `.x` (4) and `.y` (3) are different lengths" – dpprdan Oct 04 '16 at 10:55
  • @dapperdan Your question was not about `group_by` – akrun Oct 04 '16 at 11:01
  • Well, maybe I wasn't explicit enough, but I did mention "(throw a group_by(cyl) into the mix, or use mutate_at if you'd like to have more rows)", didn't I? – dpprdan Oct 04 '16 at 11:21
  • No problem. But there is more to a question than just the code. ;-) I made it more explicit, as you can see. – dpprdan Oct 04 '16 at 11:40
  • :D. That's the right direction, but we are not there quite yet. Because the multiplier vector is named for a reason. If you change it's order to `multiplier <- c("wt" = 4, "disp" = 2, "hp" = 3)` your code multiplies `disp` by 4, `hp` by 2 and `wt` by 3 and that's clearly not what I want. – dpprdan Oct 04 '16 at 12:46
  • @dapperdan In that case just do `multiplier[vars]` – akrun Oct 04 '16 at 12:48
  • Well I'd argue that it was all there in the original question. Just not explicit enough for people who just read the code and not the prose. :P Seriously, thanks a lot for the intro to `purrr`. I also like the `tidyr`approach even though it performs slightly worse. I hope you don't mind me adding a little something to the `select` and `bind_cols` functions to make it more generally applicable? – dpprdan Oct 04 '16 at 14:07
0

I am not sure if you can do this in the summarise_at function, but this is close alternative...

library(dplyr)
library(purrr)

vars <- c("disp", "hp", "wt")
multiplier <- c("disp" = 2, "hp" = 3, "wt" = 4)

mtcars %>% 
    summarise_at(vars, sum) %>% 
    do(. * multiplier) 

     disp    hp      wt
1 14766.2 14082 411.808

****REDUX****

Include the grouping var cyl in the multiplier and set it equal to 1. @akrun's map2_df does the real work here.

vars <- c("disp", "hp", "wt")
multiplier <- c("cyl" = 1, "disp" = 2, "hp" = 3, "wt" = 4)

mtcars %>% 
    group_by(cyl) %>% 
    summarise_at(vars, sum) %>% 
    map2_df(multiplier, ~ .x * .y) 


    cyl   disp    hp      wt
  <dbl>  <dbl> <dbl>   <dbl>
1     4 2313.0  2727 100.572
2     6 2566.4  2568  87.280
3     8 9886.8  8787 223.956
conor
  • 1,267
  • 10
  • 7
  • Not bad, but when I have more rows (e.g. by adding add a `group_by(cyl)`) these get multiplied row-wise. And additional columns (here `cyl`) get multiplied as well. I just want to multiply the columns mentioned in the named vector with the values in that vector, so disp*2, hp*3, wt*4). – dpprdan Oct 04 '16 at 11:05
  • Well, I think the better alternative to adding `cyl` to `multiplier` is to use `multiplier[vars]` in the `map2_df` because it is more general. This and the rest is really @akrun's answer, so I am marking his as the answer to my question. – dpprdan Oct 04 '16 at 14:22