2

I have a dataset where some missing values are coded as -99, and tried to use the naniar function replace_with_na_all to replace those values with NA. The function does this, but it also seems to convert my factor variables to integers, thereby losing the name of the factors.

This happens whether the factor itself already has some true (NA) missing values or not, which you can see in the example below (in tibble1 the factor has a missing value from the start, in tibble2 it does not).

library(tidyverse)
library(naniar)

# Example factor with missing values
tibble1 <- tribble(
  ~x, ~y,
  "a", 1,
  -99, 2,
  "c", -99
)

tibble1$x <- as.factor(tibble1$x) 


levels(tibble1$x) <- list("A" = "a",
                          "C" = "c")

# Showing original tibble and then after replace_with_na_all is used
tibble1
tibble1 %>% naniar::replace_with_na_all(condition = ~.x == -99) 




# Example factor without missing values
tibble2 <- tribble(
  ~x, ~y,
  "a", 1,
  "b", 2,
  "c", -99
)

tibble2$x <- as.factor(tibble2$x) 


levels(tibble2$x) <- list("A" = "a",
                          "B" = "b",
                          "C" = "c")

# Showing original tibble and then after replace_with_na_all is used
tibble2
tibble2 %>% naniar::replace_with_na_all(condition = ~.x == -99)  

There is no error message, I just did not expect this behavior and can't find a reason for it (or way around it) in the documentation. Is this a bug? A feature?

Help.

1 Answers1

1

Is there a specific reason to use naniar, or can you use dplyr? The dplyr preserves the data types in your columns:

> dplyr::mutate_all(tibble1, funs(replace(., . == -99, NA)))
# A tibble: 3 x 2
  x         y
  <fct> <dbl>
1 a         1
2 NA        2
3 c        NA

> dplyr::mutate_all(tibble2, funs(replace(., . == -99, NA)))
# A tibble: 3 x 2
  x         y
  <fct> <dbl>
1 a         1
2 b         2
3 c        NA
mysteRious
  • 4,102
  • 2
  • 16
  • 36
  • Thank you! This was quite helpful. I ended up using another process that worked, but the main motivation for using naniar was that I did not know how to do it through the tidyverse. Still, I think this naniar function should not be behaving this way. I'm new to all this, do you think I should submit this as a bug on gitHub? – Vasco Brazao Oct 31 '19 at 10:36