0

this is my code:

library(dplyr)

# Create dataframe
df <- data.frame(
col1 = rep(2, 10),
col2 = rep(4, 10),
col3 = rep(6, 10),
col4 = c(NA, rep(8, 9)))

#create a new variable
df <- df %>%
mutate(index = ifelse((col1 == 2) + (col2 == 4) + (col3 == 6) + (col4 == 8) >= 3, 0, 1))

According to the ifelse statements if at least 3 conditions are met, the code will assign 0 otherwise 1. The problem is that in my dataset there are missing data, and this is a bit problematic when using ifelse conditions. As in the example, the first row has 1 missing data. Despite 3 out of four conditions are satisfied, it returns an NA in the index, instead of the expected 0. Any suggestion on how can I deal with it?

Ramon_88
  • 109
  • 11
  • 2
    You could use `isTRUE` around each statement. That would return `FALSE` for an `NA` value – divibisan Aug 09 '23 at 17:55
  • 2
    Using `%in%` rather than `==` will return `FALSE` for `NA` values. – Gregor Thomas Aug 09 '23 at 18:00
  • thanks for the answer @GregorThomas, it looks like a good solution. What if I have different conditions like >= or <=? – Ramon_88 Aug 09 '23 at 18:25
  • Hi @divibisan, thanks for your answer. I tried but apparently it doesn't work – Ramon_88 Aug 09 '23 at 19:33
  • What doesn't work? If you've tried all the suggestions in the linked question and they still don't work, show what you tried and why they didn't work and we can give you further help or reopen the question – divibisan Aug 09 '23 at 19:51
  • In that case, divibisan's suggestion will work, `ifelse(isTRUE(col1 > 2) + isTRUE(col2 <= 4) + ...`. The `%in%` option is just a little more concise than all the `isTRUE`s, but `isTRUE` is more general. – Gregor Thomas Aug 09 '23 at 20:21
  • @divibisan this is the code I modified according to your suggestion: ```df <- data.frame( col1 = rep(2, 10), col2 = rep(4, 10), col3 = rep(6, 10), col4 = c(NA, rep(8, 9)) ) df <- df %>% mutate(index = ifelse(isTRUE(col1 == 2) + isTRUE(col2 == 4) + isTRUE(col3 == 6) + isTRUE(col4 == 8) >= 3, 0, 1))``` It returns all 1 in the new variable "index" instead of 0. I don't know if I did something wrong or not – Ramon_88 Aug 10 '23 at 10:48
  • 1
    No, it's my fault. `isTRUE` doesn't accept vectors so you need to insert `rowwise` before the mutate in the pipe so it runs the command line by line – divibisan Aug 10 '23 at 15:52
  • 1
    A better option is to use `rowSums` with na.rm=T: `mutate(df, index = ifelse(rowSums(cbind((col1 == 2), (col2==4), (col3 == 6),(col4 == 8)), na.rm=T) >= 3, 0, 1))` should do what you want – divibisan Aug 10 '23 at 15:52
  • Hi @divibisan, I tried your code and it works fine now. Thanks! – Ramon_88 Aug 10 '23 at 19:15

0 Answers0