I am trying to check the values of various binary columns and then mutate a new column populated with a value conditional to those columns.
The new column should indicate which column included 1 when all others were 0 and record the number of that column. If multiple 1s are present, a zero should be recorded. Y1 is included as an indicator that some columns need to be ignored.
For example,
if X1=0, X2=0, X3=1; I'd like Xfoo to be 3,
if X1=1, X2=0, X3=1; I'd like Xfoo to be 0, etc
So far I have been achieving this using dplyr mutate
and case_when
as follows:
# create toy data:
set.seed(123)
N<-10
mydata <- data.frame(Y1 = sample(0 : 1, N, replace = TRUE),
X1 = sample(0 : 1, N, replace = TRUE),
X2 = sample(0 : 1, N, replace = TRUE),
X3 = sample(0 : 1, N, replace = TRUE))
# perform mutate:
mydata <- mydata %>%
mutate(
Xfoo = case_when(
X1 == 1 & X2 == 0 & X3 == 0 ~ 1,
X1 == 0 & X2 == 1 & X3 == 0 ~ 2,
X1 == 0 & X2 == 0 & X3 == 1 ~ 3,
TRUE ~ 0)
)
I am happy with this approach given small numbers of variables, however, I realise that this quickly becomes less effective with large numbers of variables.
To combat this, I'd have thought that grouping conditions in some way would keep things concise. Something like:
case_when(
X1 == 1 & c(X2, X3) ==0 ~ 1
...
I realise that my logic is wrong here, I've been experimenting with all()
and any()
but can't figure it out and would appreciate some pointers.