2

Let's assume that I have data like below:

structure(list(A = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 8), B = c(0, 1, 1, 0, 0, 1, 4, 9.2, 9, 0, 0, 1), C = c(2, 9, 0, 0, 0, 9, 0, 0, 0, 0, 0, 8)), .Names = c("A", "B", "C"), row.names = c(NA, -12L), class = "data.frame")

Now I would like to create dummy variables for these columns for which proportion of 0's is greater than 0.5. These dummy variables would have value 0 if there is 0 in original column, and 1 if opposite. How can I accomplish that with dplyr? I was thinking of data %>% mutate_if(~mean(. == 0) > .5, ~ifelse(. == 0, 0, 1)), but this operates in place and I need to create new variables named e.g. A01, C01 and preserve the old ones A and C.

jakes
  • 1,964
  • 3
  • 18
  • 50

1 Answers1

2

We wrap with the funs and give a different name which will append as suffix

library(dplyr)
library(stringr)
df1 %>% 
   mutate_if(~mean(. == 0) > .5, funs(`01` = ifelse(. == 0, 0, 1))) %>%
   rename_all(str_remove, "_")
#   A   B C A01 C01
#1  0 0.0 2   0   1
#2  0 1.0 9   0   1
#3  0 1.0 0   0   0
#4  0 0.0 0   0   0
#5  0 0.0 0   0   0
#6  0 1.0 9   0   1
#7  0 4.0 0   0   0
#8  0 9.2 0   0   0
#9  0 9.0 0   0   0
#10 0 0.0 0   0   0
#11 1 0.0 0   1   0
#12 8 1.0 8   1   1

In the newer version of dplyr, we can use mutate with across

df1 %>%
   mutate(across(where(~ mean(. == 0) > .5), 
          ~ as.integer(. != 0), .names = '{.col}01'))
akrun
  • 874,273
  • 37
  • 540
  • 662
  • funs() was deprecated in dplyr 0.8.0. Now you need to use named functions. `df1 %>% mutate_if(~mean(. == 0) > .5, list(suffix = function(.) {ifelse(. == 0, 0, 1)} ))` – colej1390 Mar 03 '21 at 16:59
  • @user1582665 thanks, I added the `across` that is mostly current – akrun Mar 03 '21 at 17:49