2

I'm fairly new to R and trying to recode missing values (stored as -99) to NA. For some reasons this removes all my variable labels from the data frame.

My code is the following

df <- df %>%
  mutate(across(everything(), ~ ifelse(. == -99, NA, .)))

Is their any way to work around this or possibly use another command? Thank you very much in advance!

Here is some of the data I'm using:

structure(list(yrbrn = structure(c(1965, 1952, 1952, 1969, 1980, 
1975, 1989, 2000, 2005, 1963, 2001, 1985, 2002, 1956, 1999, 1997, 
1953, 1991, 1993, 1966, 2004, 1977, 1964, 1991, 1970, 1990, 1946, 
1944, 1957, 2005, 1997, 1960, 1944, 1982, 1956, 1980, 1964, 1956, 
1957, 1957, 1949, 1997, 1948, -99, 2004, 1961, 1973, 1935, 1983, 
1964), label = "Year of birth", format.stata = "%10.0g", labels = c(`no answer` = -99, 
Refusal = NA, `Don't know` = NA, `No answer` = NA), class = c("haven_labelled", 
"vctrs_vctr", "double")), gndr = structure(c(1, -99, 1, 1, 1, 
1, 1, 2, 1, 2, 1, 2, 2, 2, 1, 1, 1, 2, 1, 1, 2, 2, 1, 2, 1, 1, 
1, 2, -99, 1, 1, 2, 1, 2, 2, 2, 1, 2, 2, 1, 1, 1, 2, 1, 1, 1, 
2, 2, 2, 1), label = "Gender", format.stata = "%9.0g", labels = c(`no answer` = -99, 
Male = 1, Female = 2, `No answer` = NA), class = c("haven_labelled", 
"vctrs_vctr", "double"))), row.names = c(NA, -50L), class = c("tbl_df", 
"tbl", "data.frame"))
Darren Tsai
  • 32,117
  • 5
  • 21
  • 51
Stephan
  • 111
  • 6
  • Please provide an [MRE](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example), i.e. paste an example of your data as the output of `dput`, then it's easier to help you, thanks! – starja Feb 20 '23 at 14:25

2 Answers2

1

It seems that ifelse from base is not able to keep the labels of a labelled column. You can use if_else from dplyr instead:

df %>%
  mutate(across(everything(), ~ if_else(. == -99, NA, .)))

# # A tibble: 50 × 2
#        yrbrn        gndr
#    <dbl+lbl>   <dbl+lbl>
#  1      1965  1 [Male]  
#  2      1952 NA         
#  3      1952  1 [Male]  
#  4      1969  1 [Male]  
#  5      1980  1 [Male]  
#  6      1975  1 [Male]  
#  7      1989  1 [Male]  
#  8      2000  2 [Female]
#  9      2005  1 [Male]  
# 10      1963  2 [Female]
# # … with 40 more rows

You can also use replace:

df %>%
  mutate(across(everything(), ~ replace(.x, .x == -99, NA)))
Darren Tsai
  • 32,117
  • 5
  • 21
  • 51
  • Thanks, the second option worked for me :) – Stephan Feb 20 '23 at 15:33
  • @Kaspar Glad to help you. I'm curious why the first one does not work? Any error or warning message? – Darren Tsai Feb 20 '23 at 15:41
  • there appears to be an error in a variable I didn't provide. This is the error message: Error in `mutate()`: ! Problem while computing `..1 = across(everything(), ~if_else(. == -99, NA, .))`. Caused by error in `across()`: ! Problem while computing column `idno`. --- Backtrace: 1. dfhw1 %>% ... 14. dplyr (local) ``(``) Caused by error in `if_else()`: ! `false` must be a logical vector, not a double vector. --- Backtrace: 1. dfhw1 %>% ... 8. dplyr::if_else(idno == -99, NA, idno) – Stephan Feb 20 '23 at 15:49
0

You can do the following using the data.table grammar. The loop is not very elegant but it works:

library(data.table)
setDT(df)
for (x in names(df)) {df[get(x) == -99, (x) := NA]}
L--
  • 565
  • 1
  • 12
  • This produces the following error message: Error in `vectbl_as_col_location()`: ! Can't subset columns with `lapply(...)`. ✖ `lapply(...)` must be logical, numeric, or character, not an empty list – Stephan Feb 20 '23 at 14:39