1

I'm trying to create a new variable lab_conf based on meeting either condition for 2 other variables diagnosis and PC_R. This is the code I'm using:

mutate(lab_conf = ifelse( (diagnosis == "confirmed")|(PC_R == "pos"), "pos", "neg"))

The output I'm getting is showing NA where it should show "neg", so I'm only getting 2 values; "pos" or "NA". I'd like the values for this new variable to be either "pos", "neg", or NA based based on the conditions specified, where NA would be if it's NA in both conditions.

This is what I get with dput(head(x)):

structure(list(diagnosis = structure(c(16L, 16L, 16L, 3L, 16L, 
3L), .Label = c("*un-confirmed", "Cloted sample", "confirmed", 
"Hemolysed sampl", "inadequate sample", "rej (sample leaking)", 
"rej(Hemolyzed sample)", "rej(Hemolyzed)", "rej: sample Hemolyzed", 
"rej: sample leaking", "rej: sample leaking + Hemolyzed", "rej: sample leaking+not convnient tube", 
"repeat sample", "tf", "TF", "un-confirmed"), class = "factor"), 
    PC_R = structure(c(NA_integer_, NA_integer_, NA_integer_, 
    NA_integer_, NA_integer_, NA_integer_), .Label = c("clotted", 
    "hemolyzed", "neg", "not pos", "Not REQUIred", "OTHER", "pos", 
    "QNS", "rej", "repeat sample", "Sample broken", "tf", "TF"
    ), class = "factor"), lab_conf = c(NA, NA, NA, "pos", NA, 
    "pos")), .Names = c("diagnosis", "PC_R", "lab_conf"), row.names = c(NA, 
6L), class = "data.frame")
Ktass
  • 47
  • 1
  • 1
  • 7
  • 3
    You're going to need to provide some sample data for meaningful help. On my console it worked fine, so it's something else with your data. Perhaps you have `factor`s or mixed case? – r2evans Feb 12 '19 at 23:20
  • @r2evans I'm new to SO..how do I provide sample data? Clearly I can't simply copy/paste a subset of the dataframe, the formatting doesn't work. – Ktass Feb 13 '19 at 08:07
  • 3
    Actually, you can (usually, not always). You'll see comments encouraging making the question *reproducible*. This includes sample code (including listing non-base R packages), sample data (e.g., `dput(head(x))`), and expected output. Refs: https://stackoverflow.com/questions/5963269, https://stackoverflow.com/help/mcve, and https://stackoverflow.com/tags/r/info. – r2evans Feb 13 '19 at 08:17

2 Answers2

3

Use %in% instead of ==, like so:

df = df %>%
  mutate(lab_conf = ifelse( (diagnosis %in% "confirmed") | (PC_R %in% "pos"), "pos", "neg"))

The problem you're experience is that the == operator returns NA if one of the operands is NA. Also, NA | FALSE returns NA. These two facts are why your OR statement are evaluating to NA, which is causing your ifelse to evaluate to NA.

The ifelse statement is set to return "pos" if the statement evaluates to TRUE and "neg" if the statement evaluates to FALSE, but the ifelse doesn't return anything if the statement evaluates to NA. That's why you're getting NAs.

Using %in% gets around this.

LetEpsilonBeLessThanZero
  • 2,395
  • 2
  • 12
  • 22
0

Usually, when you provide sample data you want to provide all the possible outcomes. The sample data you provided is all the same.

I've created some sample data for you which I think is what you're going for and then how to do it.

 library(dplyr)
temp2 <- structure(list(diagnosis = c("unconfirmed", "unconfirmed", "unconfirmed", "confirmed", "confirmed", "confirmed"), PC_R = c("pos", "neg",NA, "pos", "neg", NA)), row.names = c(NA, -6L), class = "data.frame")

temp2 %>% mutate(lab_conf = ifelse(diagnosis == "confirmed" | PC_R == "pos", "pos", "neg"))

   diagnosis PC_R lab_conf
1 unconfirmed  pos      pos
2 unconfirmed  neg      neg
3 unconfirmed <NA>     <NA>
4   confirmed  pos      pos
5   confirmed  neg      pos
6   confirmed <NA>      pos
user1357015
  • 11,168
  • 22
  • 66
  • 111