1

I have a dataframe with a factor variable identifying my groups (here y), and multiple numerical variables (to simplify, here I only show two x and z):

df = tribble(
  ~x,     ~y,     ~z,
  1,     "a",     5,   
  2,     "b",     6,   
  3,     "a",     7,    
  4,     "b",     8,  
  5,     "a",     9,  
  6,     "b",     10
)

I want to add new columns to my dataframe in which I apply different mathematical functions on those numerical variables (x and z), based on the values of the factor variable (y). For the example dataframe above: all observations with y == "a" are added with 1, and the ones with y == "b" are added with 2.

This is my code to do it:

df %>% mutate(x_new = case_when(grepl("a", y) ~ x + 1,
                                grepl("b", y) ~ x + 2))

### Output
# A tibble: 6 × 4
      x y         z     x_new
  <dbl> <chr> <dbl> <dbl>
1     1 a         5     2
2     2 b         6     4
3     3 a         7     4
4     4 b         8     6
5     5 a         9     6
6     6 b        10     8

My code works OK for adding one variable, but I want to apply the same functions for ALL the numerical variables, so in the example I want to apply the functions to the "z" variable as well and store the values in another new column. Since I have many numerical columns I don't want to manually mutate them one by one with the approach above. Any advice on how to do this? (specially tidyverse solutions but any help is very appreciated)

Meisam
  • 601
  • 1
  • 3
  • 16
  • Does this answer your question? [Sum across multiple columns with dplyr](https://stackoverflow.com/questions/28873057/sum-across-multiple-columns-with-dplyr) – benson23 Mar 08 '23 at 08:12
  • 2
    You can use `dplyr::across` for this. i.e. `df %>% mutate(across(c(x, z), ~ case_when(grepl("a", y) ~ .x + 1, grepl("b", y) ~ .x + 2), .names = "{.col}_new"))`. Use the `.names` argument to set the resulting column name. – benson23 Mar 08 '23 at 08:13
  • 1
    Also, as stated in your question, it might be more accurate to use `y == "a"` and `y == "b"` in your `case_when` instead of using `grepl` – benson23 Mar 08 '23 at 08:15
  • @benson23 you need `is.numeric` in your `across`. – M-- Mar 08 '23 at 08:19
  • Thanks benson! Yes that worked great! Do you wanna post it as answer so I accept? – Meisam Mar 08 '23 at 08:20
  • 2
    @MeisamYSF Glad it worked. Since I think this is a duplicated question, I guess it would be better to just acknowledge the duplicate :) – benson23 Mar 08 '23 at 08:22
  • @benson23 how is this a duplicate of sum across? This has no summing. – zx8754 Mar 08 '23 at 08:24
  • 1
    @zx8754 I guess then there will be no duplicate questions on this site if you need to be that specific. The OP is asking for ways to perform the same operation over multiple columns in tidyverse, we all know it's the `across` function in the dup. BTW the OP specifically ask for a `tidyverse` solution but you've provided a base R alternative. – benson23 Mar 08 '23 at 08:29
  • 1
    @zx8754 However, I agree my dup post might not be the best one to reflect the usage of the `across` function, would like to know a better one if you are aware of it. – benson23 Mar 08 '23 at 08:31
  • 2
    @benson23 Appreciate you trying to find a duplicate. In this case the linked post is too broad, it is almost like saying read the manual for "across". I don't know tidy enough to answer in tidyverse. – zx8754 Mar 08 '23 at 08:33

1 Answers1

2

Make a lookup to match a, b with 1,2. Then add to df excluding y column. Finally, suffix with "_new" and column bind back to original dataframe:

ll <- setNames(c(1, 2), c("a", "b"))

x <- df[, -2 ] + ll[ df$y ]
colnames(x) <- paste0(colnames(x), "_new")

cbind(df, x)
#   x y  z x_new z_new
# 1 2 a  6     3     7
# 2 4 b  8     6    10
# 3 4 a  8     5     9
# 4 6 b 10     8    12
# 5 6 a 10     7    11
# 6 8 b 12    10    14
zx8754
  • 52,746
  • 12
  • 114
  • 209
  • Thank you very much! This is very clever, but it works as long as the function applied to the different conditions is the same (here sum), it is actually different mathematical functions in my original problem which works pretty well with @benson23 solution. – Meisam Mar 08 '23 at 08:32
  • 1
    @MeisamYSF Then please make your example as close as to your real "mathematical functions". – zx8754 Mar 08 '23 at 08:35
  • Sorry for the ambiguity this was my first question on SO, really appreciate your time and efforts on this! – Meisam Mar 08 '23 at 08:38