0

I am looking for help in adding a dummy variable to an existing dataframe based on conditions in multiple columns (this last bit is what separates my question from the answers I already found).

Here's a simple example:

y <- c(1,2,5,2,3,3)
z <- c("A", "B", "B", "A", "A", "B")
df <- as.data.frame(y,z)

Now I'd like to have a third column, which takes the value '1' if y is equal to 2 or if z is equal to B. So the column would show a value of 1 for all observations except the first (A,1) and the fifth (A,3).

I'm sure I know all the ingredients for doing this, I just cannot put it together right now. Any help would be much appreciated!

SpecialK201
  • 111
  • 7

1 Answers1

2

dplyr option using case_when:

y <- c(1,2,5,2,3,3)
z <- c("A", "B", "B", "A", "A", "B")
df <- data.frame(y = y, z = z)

library(dplyr)
df %>%
  mutate(dummy = case_when(y == 2|z == "B"~1,
                           TRUE ~ 0))
#>   y z dummy
#> 1 1 A     0
#> 2 2 B     1
#> 3 5 B     1
#> 4 2 A     1
#> 5 3 A     0
#> 6 3 B     1

Created on 2022-07-19 by the reprex package (v2.0.1)

Quinten
  • 35,235
  • 5
  • 20
  • 53
  • I got it to work, just one follow-up question. Can I add a condition whereby if either of the rows y or z are NA, the dummy also takes on NA? For my purpose, I could delete missing values prior to the operation, but I'd like to do it in a less brute way. – SpecialK201 Jul 20 '22 at 10:49