How to deal with the presence of NAs using mutate and ifelse?

Question

So I have a data frame such as this:

A_count | B_count
0       | 0
312     | NA
2       | 23
0       | 2
NA      | NA
13      | 0

I want to create a third column that checks whether at least one of these columns has a value that isn't 0 or NA. So I tried:

df<-df %>%
  mutate(new_column= ifelse(A_count>0 | B_count > 0, "yes","no"))

So, if either of them is more than 0, then the new column should have "yes", and all other cases should be "no" (i.e. the zeros and NAs). But the result I'm getting isn't exactly that because I'm getting NAs in the new column and I'm not getting any "no"s. I'm guessing it's the NAs that are messing it up, but I'm not sure. Thanks in advance for any answer

Try this change `ifelse(A_count>0 | is.na(A_count) | B_count > 0 | is.na(B_count), "yes","no")` — Duck, Oct 21 '20 at 15:16

Ronak Shah · Answer 1 · 2020-10-21T15:27:26.000

1

You can use rowSums which will allow to write this for many columns without specifying them individually :

df$col <- ifelse(rowSums(df > 0, na.rm  =TRUE) > 0, 'Yes', 'No')
#Without ifelse
#df$col <- c('No', 'Yes')[(rowSums(df > 0, na.rm  =TRUE) > 0) + 1]
df
#  A_count B_count col
#1       0       0  No
#2     312      NA Yes
#3       2      23 Yes
#4       0       2 Yes
#5      NA      NA  No
#6      13       0 Yes

To do this for selected columns we can subset them :

cols <- c('A_count', 'B_count')
df$col <- ifelse(rowSums(df[cols] > 0, na.rm  =TRUE) > 0, 'Yes', 'No')

We can change cols to cols <- grep('_count', names(df), value = TRUE) to select all the columns with '_count' in it.

edited Oct 21 '20 at 15:27

answered Oct 21 '20 at 15:20

Ronak Shah

377,200
20
156
213

thanks a lot that seems to work fine. But let's say my df has way more columns than the ones I showed, how can I select specifically "A_count" and "B_count"? I ask this before in reality I have other columns with strings for example – tadeufontes Oct 21 '20 at 15:24
Check updated answer that will allow to do this only for specific columns. – Ronak Shah Oct 21 '20 at 15:28

score 0 · Answer 2 · answered Oct 21 '20 at 15:26

With dplyr you can use c_across() to define the ranges of variables and then evaluate the conditions. Here the code:

library(dplyr)
#Code
newdf <-df %>% rowwise() %>% 
  mutate(Var=any(c_across(A_count:B_count)>0 & !is.na(c_across(A_count:B_count)))) %>%
  mutate(Var=ifelse(Var,'Yes','No'))

Output:

# A tibble: 6 x 3
# Rowwise: 
  A_count B_count Var  
  <chr>   <chr>   <chr>
1 0       0       No   
2 312     NA      Yes  
3 2       23      Yes  
4 0       2       Yes  
5 NA      NA      No   
6 13      0       Yes

How to deal with the presence of NAs using mutate and ifelse?

2 Answers2