5

I would like to list all the occurrences in a tibble that I want to convert to missing using the na_if function un the dplyr package but I dont seem to get it right. Any leads?

library(dplyr)

set.seed(123)

df <- tibble(
  a1 = c("one", "three", "97", "twenty", "98"),
  a2 = c("R", "Python", "99", "Java", "97"),
  a3 = c("statistics", "Data", "Programming", "99", "Science"),
  a4 = floor(rnorm(5, 80, 2))
)

#--- The long route

df1 <- df %>%
  mutate(across(where(is.character), ~na_if(., "97")),
         across(where(is.character), ~na_if(., "98")),
         across(where(is.character), ~na_if(., "99")))

#---- Trial

df2 <- df %>%
  mutate(across(where(is.character),
                ~na_if(., c("97", "98", "99"))))
Moses
  • 1,391
  • 10
  • 25
  • According to [the docu](https://dplyr.tidyverse.org/reference/na_if.html) it's not possible to use `na_if` that way (i.e. to test for multiple values)... Possible solutions could be found [here](https://stackoverflow.com/questions/27909000/set-certain-values-to-na-with-dplyr) or [here](https://stackoverflow.com/questions/50436248/dplyr-replacing-na-values-in-a-column-based-on-multiple-conditions) and I am sure there are many others... but probably none using `na_if` – dario Nov 04 '21 at 08:04
  • From the dupe above, adapted for your usecase: `df2 <- df %>% mutate(across(where(is.character), ~ifelse( . %in% c("97", "98", "99"), NA, .)))` – dario Nov 04 '21 at 08:10

1 Answers1

6

You can use:

df %>%
  mutate(
      across(
          where(is.character),
          ~if_else(. %in% c("97", "98", "99"), NA_character_, .)
      )
  )
# A tibble: 5 × 4
  a1     a2     a3             a4
  <chr>  <chr>  <chr>       <dbl>
1 one    R      statistics     80
2 three  Python Data           80
3 NA     NA     Programming    76
4 twenty Java   NA             83
5 NA     NA     Science        78

The reason na_if doesn't work here is because ~na_if(., c("97", "98", "99")) is basically equivalent to if_else(. == c("97", "98", "99"), NA_character_, .). In other words, it only compares the vectors in an pairwise fashion. You can see why this is an issue:

> if_else(df$a1 == c("97", "98", "99"), NA_character_, df$a1)
[1] "one"    "three"  "97"     "twenty" NA      
Warning message:
In df$a1 == c("97", "98", "99") :
  longer object length is not a multiple of shorter object length
Migwell
  • 18,631
  • 21
  • 91
  • 160