3

I'm dealing with a large dataset and I'm trying to call which row satisfies a condition in both columns. However, I want to exclude certain values from the analysis and I figured setting them as "NA"s would be the best. To complicate this, in case of "NA"s in one of the column, I still want to see if the other column with numerical value still satisfies a condition. Below is an example of my dataset.

col1 = as.numeric(c(10, 2, 15, 2, "NA", 15))
col2 = as.numeric(c(15, 15, 2, 2, 15, "NA"))
test <- data.frame(col1, col2)

Let's say my cutoff is 5, so I want the following result:

  col1 col2   G5
1   10   15  Yes
2    2   15   No
3   15    2   No
4    2    2   No
5   NA   15 Yes
6   15   NA Yes

I tried the following but rows 5 and 6 come back as "NA" and I don't know how to address this.

test$G5 <- ifelse(test$col1 > 5 & test$col2 > 5, "Yes", "No")

  col1 col2   G5
1   10   15  Yes
2    2   15   No
3   15    2   No
4    2    2   No
5   NA   15 <NA>
6   15   NA <NA>

What is the best way of setting up ifelse statement so that "NA" can be taken as "False"?. I think problem is that when ifesle considers either column and logical test is performed with "NA", it can only return "NA".

This is first time I'm posting this so my formatting maybe very bad... sorry about that!

Thank you

David Kim
  • 33
  • 3
  • test$G5 <- ifelse(is.na(test$col1) & (test$col2 > 5) | is.na(test$col2) & (test$col1 > 5) | !is.na(test$col1|test$col2) & (test$col2 > 5 & test$col2 > 5), "Yes", "No") – Shirin Yavari Nov 22 '19 at 21:27
  • Possible duplicate of [Dealing with TRUE, FALSE, NA and NaN](https://stackoverflow.com/questions/16822426/dealing-with-true-false-na-and-nan) – astrofunkswag Nov 22 '19 at 21:27
  • @ShirinYavari I think you dealt with the `NA`'s backwards, they should end up as "No". OP please correct me if I have this wrong – astrofunkswag Nov 22 '19 at 21:31
  • @astrofunkswag based on what is shown in the question, it is right but your answer is backwards. yay nayy?! – Shirin Yavari Nov 22 '19 at 21:33
  • Oh I see that part of the post, that is what OP wants but that would be inconsistent with treating the `NA` as false, it should be ignored / treated as true – astrofunkswag Nov 22 '19 at 21:34

4 Answers4

0

test$G5 <- ifelse((test$col1 > 5 & test$col2 > 5) %in% TRUE, "Yes", "No")

test$G5 <- ifelse(((test$col1 > 5) %in% TRUE | is.na(test$col1)) & ((test$col2 > 5) %in% TRUE | is.na(test$col2)), "Yes", "No")

See this post for more info

Rui Barradas
  • 70,273
  • 8
  • 34
  • 66
astrofunkswag
  • 2,608
  • 12
  • 25
0

For me, I'd think the most beneficial way would be to use a dplyr's case_when function and explicitly state how the NA cases you mention should be handled.

Replicating your example (notice that I'm explicitly setting the NAs here. Your NAs were the result of R not being able to handle a character string ("NA") within a numeric vector.

col1 = as.numeric(c(10, 2, 15, 2, NA_real_, 15))
col2 = as.numeric(c(15, 15, 2, 2, 15, NA_real_))
test <- data.frame(col1, col2)

For both the mutate function and case_when function I'm loading dplyr. If you're not familiar with case_when it's like a ifelse with multiple conditionals. Each conditional is followed by a "~" tilde. What comes after the tilde is what gets assigned if the conditional is met. To set "everything else" as some value X you type TRUE ~ "x" as that obviously gets evaluated as true for all the other cases that have not been met in the previous conditionals.

This should do what you want:

library(dplyr)

test <- mutate(.data = test,
               G5 = case_when(col1 > 5 & col2 > 5 ~ "Yes", #Original 
                              (is.na(col1) & col2 > 5) | (col1 > 5 & is.na(col2)) ~ "Yes",
                              TRUE ~ "No")) # Everything else gets the value "No"


test
#>   col1 col2  G5
#> 1   10   15 Yes
#> 2    2   15  No
#> 3   15    2  No
#> 4    2    2  No
#> 5   NA   15 Yes
#> 6   15   NA Yes
Hlynur
  • 335
  • 2
  • 7
0

Here is a way without ifelse or any other decision making instruction (switch or case_when).

i <- with(test, (col1 > 5 & col2 > 5) | is.na(col1) | is.na(col2))
test$G5 <- c("No", "Yes")[i + 1]

test
#  col1 col2  G5
#1   10   15 Yes
#2    2   15  No
#3   15    2  No
#4    2    2  No
#5   NA   15 Yes
#6   15   NA Yes
Rui Barradas
  • 70,273
  • 8
  • 34
  • 66
0

Here is another one using rowSums

test$G5 <- c("No", "Yes")[(rowSums(is.na(test) | test > 5) == ncol(test)) + 1]
test

#  col1 col2  G5
#1   10   15 Yes
#2    2   15  No
#3   15    2  No
#4    2    2  No
#5   NA   15 Yes
#6   15   NA Yes

Also another approach could be to replace all NA with Inf and then compare with 5.

test[is.na(test)] <- Inf
test$G5 <- c("No", "Yes")[(rowSums(test > 5) == ncol(test))+ 1]
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213