How can I replace missing data with a specific value

Question

I have a column indicating the outcome of patients, however it has many NAs. This column has multiple 'character' options (e.g. 'Alive, discharged with supplemental oxygen', 'Deceased', etc.), however I use an index to dichotomize it so those variables largely don't matter. My issue is that I need to change the NAs into one of the variables in this column ('Outcome'). It can be any of the values except for 'Deceased' including a new one if that works. I'm trying to apply this change across the entire data set as I will then use the new dichotomized across many calculations.

cleanset <- Data1 %>% 
  drop_na(BMI) %>% 
  filter(!complete.cases(.))

The above represents the first of the code I run to clean up the set. I then would like to follow it with the mutation/transformation of the NAs for the 'Outcome' variable.

mutate(cleanset$Outcome = replace_na(cleanset$Outcome, "Alive, discharged with supplemental oxygen"))

The above is my failed attempt at achieving this, so I would appreciate any input on how I can successfully mutate the NAs for this variable to give them a specific value.

Does `cleanset %>% mutate(Outcome = replace_na(Outcome, "Alive, discharged with supplemental oxygen"))` works? — Park, Dec 17 '21 at 00:08
It might also be good to take a look at some beginner tutorials on using dplyr: functions like `mutate` generally operate on a data frame, not vectors, and expect arguments to be passed as bare column names. Such as `mutate(cleanset, Outcome = ...)` — camille, Dec 17 '21 at 15:48

score 1 · Answer 1 · edited Dec 17 '21 at 04:44

1

Try this

 df%>%
    mutate(outcome_1=case_when(outcome = is.na(outcome) ~"Alive, discharged with supplemental oxygen", TRUE ~ octcome))

In this code, I assumed that the outcome column is the column where there is NA so I created another column which is outcome_1 and replace all NA in the outcome column with "Alive, discharged with supplemental oxygen" in the outcome_1

I hope this is what you want.

edited Dec 17 '21 at 04:44

Lori

1,392
1
21
29

answered Dec 17 '21 at 01:40

Yomi.blaze93

401
3
10

What's the advantage of using `case_when` over `ifelse` if you only have a single condition? It's much longer, introduces unusual syntax and a base function would suffice. `tidyr::replace_na` is even shorter – camille Dec 17 '21 at 02:56
have you been able to resolve it? – Yomi.blaze93 Dec 17 '21 at 06:05
It's pretty straightforward with either `replace_na(outcome, "Alive, discharged with supplemental oxygen")` or `ifelse(is.na(outcome), "Alive, discharged with supplemental oxygen", outcome)`, but this is also a question that has a lot of duplicates on SO, including the one I flagged it as – camille Dec 17 '21 at 15:46
Unfortunately I'm still having issues. I tried @user16087142 's approach first with the relevant syntax ` mutate(Outcome_1=case_when(cleanset$Outcome = is.na(cleanset$Outcome) ~"Alive, discharged home on supplemental oxygen.", TRUE ~ cleanset$Outcome))` but I receive the following error: unexpected '=' in "mutate(Outcome_1=case_when(cleanset$Outcome =" I then tried @camille 's method both with replace_ na (which runs, but the console just says 'character(0)' and doesn't result in any change) & then with if_else but that also fails to result in any changes – Clinical Pursuits45 Dec 17 '21 at 18:28
Just for further clarification, when trying @camille 's method, I wrote it up as follows: `replace_na(cleanset$Outcome, "Alive, discharged home on supplemental oxygen.")` `ifelse(is.na(cleanset$Outcome), "Alive, discharged home on supplemental oxygen.", cleanset$Outcome)` when is make a table `table(cleanset$outcome)` the result shows the same values for each category as before. if anyone has any other suggestions I'd truly appreciate it Thank you! – Clinical Pursuits45 Dec 17 '21 at 18:34
@ClinicalPursuits45 I think there's probably still problems with how you're using dplyr functions and/or whether you're actually assigning anything back to your data frame; see my comment on your question – camille Dec 17 '21 at 18:58
@ClinicalPursuits45 can you share a sample dataset ?? – Yomi.blaze93 Dec 17 '21 at 21:41
So I'm quite confused because it looks like some of these mutate functions are actually working when I input into console: `cleanset %>% select(Outcome) %>% mutate(outcomes = replace_na(Outcome, "A"))` The above code results in the console outputting a 'Outcome' & modified 'outcomes' column with the 'outcomes' column having had its NAs replaced by the value 'A'. That's exactly what I need; however, if I then try to view the 'outcomes' vector it doesn't appear?! I've truly never seen this before – Clinical Pursuits45 Dec 18 '21 at 00:27

How can I replace missing data with a specific value

1 Answers1