0

How can I write the following condition (preferably using case_when)? I need to replace NAs with TRUE if two variables (i.e., name and salary) are NA.

df <- data.frame(
   id = c (1:5), 
   name = c("Rick","Dan","Michelle",NA,"Gary"),
   salary = c(623.3,515.2,611.0,NA,843.25), 
   start_date = as.Date(c(NA, "2013-09-23", "2014-11-15", "2014-05-11",
      "2015-03-27")),
   stringsAsFactors = FALSE
)

My desired output looks like this:

id     name salary start_date
1  1     Rick 623.30 NA
2  2      Dan 515.20 2013-09-23
3  3 Michelle 611.00 2014-11-15
4  4     TRUE   TRUE 2014-05-11
5  5     Gary 843.25 2015-03-27

The condition would be something like this but it does not replace the values in their columns:

case_when(is.na(df$name)& is.na(df$salary) ~TRUE)

Thanks in advance for your input.

Rui Barradas
  • 70,273
  • 8
  • 34
  • 66
Rara
  • 105
  • 9
  • 2
    In a data.frame object (like yours), a column can only be of one class. If you replace NA by TRUE, you will results in coercing TRUEs into characters, and the whole salary column to character. Check for instance `c(TRUE, "a", 1)` – Maël Jun 28 '23 at 10:16
  • 1
    @Maël You are right. But for my purpose, this change of class doesn't matter. – Rara Jun 28 '23 at 10:19

3 Answers3

3

You can count the number of NAs by rows across specific columns and replaces if they mean the condition:

library(dplyr)
df %>% 
  mutate(across(c(name, salary), ~ replace(.x, rowSums(is.na(across(c(name, salary)))) >= 2, "TRUE")))

#   id     name salary start_date
# 1  1     Rick  623.3       <NA>
# 2  2      Dan  515.2 2013-09-23
# 3  3 Michelle    611 2014-11-15
# 4  4     TRUE   TRUE 2014-05-11
# 5  5     Gary 843.25 2015-03-27

And the case_when version:

df %>% 
  mutate(across(c(name, salary), ~ case_when(rowSums(is.na(across(c(name, salary)))) >= 2 ~ "TRUE",
                                             .default = as.character(.x))))
Maël
  • 45,206
  • 3
  • 29
  • 67
  • Thanks for your nice solutions. However, my problem seems to be more complicated. I am asking this question in relation to my previously asked question [link](https://stackoverflow.com/questions/76439137/how-to-check-different-conditions-in-a-data-frame-based-on-the-values-in-another). If you could kindly take a look at it I would be really grateful. There is one accepted answer but then I discovered a new condition in my data (which is what I asked) and I need to fix the code to add it. Could you please see if you can help me with it? – Rara Jun 28 '23 at 13:32
1
df[is.na(df$name) & is.na(df$salary), c("name", "salary")] <- "TRUE"

#   id     name salary start_date
# 1  1     Rick  623.3       <NA>
# 2  2      Dan  515.2 2013-09-23
# 3  3 Michelle    611 2014-11-15
# 4  4     TRUE   TRUE 2014-05-11
# 5  5     Gary 843.25 2015-03-27
s_baldur
  • 29,441
  • 4
  • 36
  • 69
0

In dplyr mutate/replace several columns on a subset of rows and also shown in the Note at the end mutate_cond provides a simple way to implement this. The arguments are the data frame, the condition and the assignments to make on the rows for which the condition holds.

library(dplyr)

df %>% mutate_cond(is.na(name) & is.na(salary), name = "TRUE", salary = "TRUE")

giving

  id     name salary start_date
1  1     Rick  623.3       <NA>
2  2      Dan  515.2 2013-09-23
3  3 Michelle    611 2014-11-15
4  4     TRUE   TRUE 2014-05-11
5  5     Gary 843.25 2015-03-27

Note

# see link above
mutate_cond <- function(.data, condition, ..., envir = parent.frame()) {
  condition <- eval(substitute(condition), .data, envir)
  .data[condition, ] <- .data[condition, ] %>% mutate(...)
  .data
}
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341