1

I would like to multiply several columns on a dataframe by the values of a vector (all values within the same column should be multiplied by the same value, which will be different according to the column), while keeping the other columns as they are.

Since I'm using dplyr extensively I thought that it might be useful to use mutate_each function, so I can modify all columns at the same time, but I am completely lost on the syntax on the fun() part.

On the other hand, I've read this solution which is simple and works fine, but only works for all columns instead of the selected ones.

That's what I've done so far:

Imagine that I want to multiply all columns in df but letters by weight_df vector as follows:

df = data.frame(
  letters = c("A", "B", "C", "D"),
  col1 = c(3, 3, 2, 3),
  col2 = c(2, 2, 3, 1),
  col3 = c(4, 1, 1, 3)
)
> df
  letters col1 col2 col3
1       A    3    2    4
2       B    3    2    1
3       C    2    3    1
4       D    3    1    3
> 
weight_df = c(1:3)

If I use select before applying mutate_each I get rid of letters columns (as expected), and that's not what I want (a part from the fact that the vector is not applyed per columns basis but per row basis! and I want the opposite):

df = df %>% 
  select(-letters) %>% 
  mutate_each(funs(. * weight_df))
> df
  col1 col2 col3
1    3    2    4
2    6    4    2
3    6    9    3
4    3    1    3

But if I don't select any particular columns, all values within letters are removed (which makes a lot of sense, by the way), but that's not what I want, neither (a part from the fact that the vector is not applyed per columns basis but per row basis! and I want the opposite):

df = df %>% 
  mutate_each(funs(. * issb_weight))
> df
  letters col1 col2 col3
1      NA    3    2    4
2      NA    6    4    2
3      NA    6    9    3
4      NA    3    1    3

(Please note that this is a very simple dataframe and the original one has way more rows and columns -which unfortunately are not labeled in such an easy way and no patterns can be obtained)

Community
  • 1
  • 1
ccamara
  • 1,141
  • 1
  • 12
  • 32
  • I don't think the multiplication is doing what you expect. – James Dec 20 '16 at 17:06
  • You'll probably want to look at the purrr package, from the same family/authors, which I guess can help you "map" columns to the corresponding scalars you're multiplying by. – Frank Dec 20 '16 at 17:08
  • No, you're right, please read the last note. – ccamara Dec 20 '16 at 17:08
  • What about `df[,-1] * matrix(rep(weight_df,nrow(df)),nrow = nrow(df),byrow = T)` ? – user2100721 Dec 20 '16 at 17:20
  • thanks but your suggestion gets rid of the `letters` column, and I need it too. – ccamara Dec 20 '16 at 19:13
  • 1
    why not just `df[-1] <- t(t(df[-1]) * weight_df)` like in the linked question? Considering the fact that you've accepted a non-dplyr solution eventually I feel like this is just a dupe – David Arenburg Dec 20 '16 at 19:41

3 Answers3

6

The problem here is that you are basically trying to operate over rows, rather columns, hence methods such as mutate_* won't work. If you are not satisfied with the many vectorized approaches proposed in the linked question, I think using tydeverse (and assuming that letters is unique identifier) one way to achieve this is by converting to long form first, multiply a single column by group and then convert back to wide (don't think this will be overly efficient though)

library(tidyr)
library(dplyr)

df %>% 
  gather(variable, value, -letters) %>%
  group_by(letters) %>%
  mutate(value = value * weight_df) %>%
  spread(variable, value)

#Source: local data frame [4 x 4]
#Groups: letters [4]

#     letters  col1  col2  col3
# *    <fctr> <dbl> <dbl> <dbl>
#   1       A     3     4    12
#   2       B     3     4     3
#   3       C     2     6     3
#   4       D     3     2     9 
David Arenburg
  • 91,361
  • 17
  • 137
  • 196
2

try this

library(plyr)
library(dplyr)

df %>% select_if(is.numeric) %>% adply(., 1, function(x) x * weight_df)
manotheshark
  • 4,297
  • 17
  • 30
  • 1
    what package does `adply()` belong to? – mabdrabo Dec 20 '16 at 18:52
  • didn't know `adply()` before. For what I see it belongs to `plyr` – ccamara Dec 20 '16 at 19:06
  • @manotheshark, it works fine except for the fact that select removes the column(s) that are not numerical, and I need them for other users (Although I don't want to perform operations with it). – ccamara Dec 20 '16 at 19:07
  • 1
    @ccamara see David's answer using `tidyr` to retain the non-numerical columns. It could also be applied to this answer, but his answer will probably be more legible. – manotheshark Dec 20 '16 at 21:21
2

using dplyr. This filters numeric columns only. Gives flexibility for choosing columns. Returns the new values along with all the other columns (non-numeric)

index <- which(sapply(df, is.numeric) == TRUE)
df[,index] <- df[,index] %>% sweep(2, weight_df, FUN="*")

> df
  letters col1 col2 col3
1       A    3    4   12
2       B    3    4    3
3       C    2    6    3
4       D    3    2    9
mabdrabo
  • 1,050
  • 21
  • 35
  • Thanks! Operations work fine, despite the fact that I get NA values for the `letters` column – ccamara Dec 20 '16 at 19:09
  • why is that? I tried on the sample given and gave the correct `letters` values – mabdrabo Dec 20 '16 at 19:31
  • Hum... I guess I had corrupted my `df` trying other responses. Repeated everything from scratch and works fine, although I have to admit it looks like black magic and I don't understand what your script does :( – ccamara Dec 20 '16 at 19:36
  • 3
    Which part is using dplyr here exactly? – David Arenburg Dec 20 '16 at 19:37