1

I am looking for a clever way to replace a specific NA value with a character.

An example data.frame would look like this

library(tidyverse)
df <- tibble(genes=c("A","B","C"), x=c(NA,NA,4), y=c(NA,3,4))
df
#> # A tibble: 3 × 3
#>   genes     x     y
#>   <chr> <dbl> <dbl>
#> 1 A        NA    NA
#> 2 B        NA     3
#> 3 C         4     4

Created on 2023-03-27 with reprex v2.0.2

I want to replace when gene==A the NA value in the x column with Yes. The rest of the values in the x column should remain the same. I tried with mutate and case_when(), but I am taking an error.

I want my date to look like this

#> # A tibble: 3 × 3
#>    genes     x     y
#>   <chr>    <dbl> <dbl>
#> 1   A       Yes    NA
#> 2   B        NA     3
#> 3   C         4     4

halfer
  • 19,824
  • 17
  • 99
  • 186
LDT
  • 2,856
  • 2
  • 15
  • 32

3 Answers3

5

Since you already have answers in dplyr, I'll offer solutions in base R and data.table.

Base R -

df$x[is.na(df$x) & df$genes == "A"] <- "Yes"

data.table requires explicit changing the columns to character instead of implicit type conversion like in base R.

library(data.table)
df$x <- as.character(df$x)
setDT(df)[is.na(x) & genes == "A", x := "Yes"]
df

#   genes    x  y
#1:     A  Yes NA
#2:     B <NA>  3
#3:     C    4  4
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
4

Quinten's answer is the simplest. I offer an alternative that is more defensive and declarative. (Translation: when I look at code that I wrote 6+ months ago, I often scratch my head wondering why certain things happen. Because of this, I tend to be a bit more "declarative" in certain steps to remind future-me of some steps.)

Background: ifelse is not class safe, it does not fail/warn when the classes of yes= and no= are not the same. (It also silently dumps some object classes, c.f., How to prevent ifelse() from turning Date objects into numeric objects) Examples:

ifelse(c(T,T), 1, "A")
# [1] 1 1
ifelse(c(T,F), 1, "A")
# [1] "1" "A"
ifelse(c(T,F), Sys.Date(), 1+Sys.Date())
# [1] 19443 19444
ifelse(c(T,F), Sys.time(), 1+Sys.time())
# [1] 1679933419 1679933420

dplyr::if_else will fail since the classes are not the same, and this is good: it protects the user from unknowingly changing the class of the object. It requires that the user enforce this internally,

df %>%
  mutate(x = if_else(is.na(x) & genes == 'A', 'Yes', x))
# Error in `mutate()`:
# ! Problem while computing `x = if_else(is.na(x) & genes == "A", "Yes", x)`.
# Caused by error in `if_else()`:
# ! `false` must be a character vector, not a double vector.
# Run `rlang::last_error()` to see where the error occurred.
df %>%
  mutate(
    x = as.character(x),
    x = if_else(is.na(x) & genes == 'A', 'Yes', x)
  )
# # A tibble: 3 × 3
#   genes x         y
#   <chr> <chr> <dbl>
# 1 A     Yes      NA
# 2 B     <NA>      3
# 3 C     4         4

(We might have used as.character(x) directly within the if_else as well.)

FYI, coalesce is also useful here, though because of the dependency on genes it cannot fully replace:

df %>%
  mutate(
    x = as.character(x), 
    x = if_else(genes == 'A', coalesce(x, 'Yes'), x)
  )

The advantage of coalesce is when there is no genes condition, where it might instead look as simple as x = coalesce(x, "Yes").

r2evans
  • 141,215
  • 6
  • 77
  • 149
3

You could use an ifelse with is.na and condition on genes like this:

library(tidyverse)
df %>%
  mutate(x = ifelse(is.na(x) & genes == 'A', 'Yes', x))
#> # A tibble: 3 × 3
#>   genes x         y
#>   <chr> <chr> <dbl>
#> 1 A     Yes      NA
#> 2 B     <NA>      3
#> 3 C     4         4

Created on 2023-03-27 with reprex v2.0.2

Quinten
  • 35,235
  • 5
  • 20
  • 53
  • That's beautiful Quinten! Thank you! I wanted to ask you if I could change the NA value after specifying the row and column dimensions of x=NA meaning [1,2] – LDT Mar 27 '23 at 14:27
  • 3
    @LDT, what I think you're asking about is `df[1,2] <- "Yes"`, but since you're using a `tibble(.)`, the answer is "No", since tibbles are smart enough to not inadvertently change the class of a column. If you first do `df[,2] <- as.character(df[,2])` you can then do `df[1,2] <- "Yes"` without error. One could also do `mutate(x = if_else(row_number() == 1, "Yes", x))` (class notwithstanding). But that begs the question ... why are you trying to do things in a more austere, error-prone way? – r2evans Mar 27 '23 at 14:29