How to define a function that replace a vector of typos for NA's in a df?

Question

I am trying to replace a set of typos in a df,

This is what I've got so far:

master_df <- invisible(
  data.frame(lapply(master_df, 
                    function(x) replace(x, as.matrix(x) == c("?", '-',''), NA))))

However the output looks as follows:

#  a    b    c
#1        <NA>
#2 ? <NA> <NA>
#3 1    2    1
#4 2    3    2
#5 3    4    3

And throws the next warnings:

Warning messages: 1: In as.matrix(x) == c("?", "-", "") : longitud de objeto mayor no es múltiplo de la longitud de uno menor

2: In as.matrix(x) == c("?", "-", "") : longitud de objeto mayor no es múltiplo de la longitud de uno menor

3: In as.matrix(x) == c("?", "-", "") : longitud de objeto mayor no es múltiplo de la longitud de uno menor

The idea is that the set of typos c('?', '-', '') are replaced by NA in the whole df.

How could I accomplish this task?

data

master_df <- structure(list(a = c("", "?", "1", "2", "3"), b = c("", NA, "2", 
"3", "4"), c = c(NA, NA, "1", "2", "3")), class = "data.frame", row.names = c(NA, 
-5L))

You could check out `makemeNA` from [the SOfun package](http://mrdwab.github.io/SOfun). `library(SOfun); makemeNA(master_df, c("?", "-", ""))`. — A5C1D2H2I1M1N2O1R2T1, Feb 22 '21 at 23:08

akrun · Accepted Answer · 2021-02-22T23:02:59.540

We need %in% instead of == as == is elementwise comparison operator

library(dplyr)
master_df2 <- master_df %>%
    mutate(across(everything(), 
  ~ replace(., . %in% c("?", '-', ''), NA_character_))) %>% 
    type.convert(as.is = TRUE)

Or using base R

master_df[] <- lapply(master_df, function(x)
      replace(x, x %in% c("?", '-', ''), NA_character_))

Or using gsub

master_df[] <- gsub('^(\\?|-|)$', NA, as.matrix(master_df))
master_df <- type.convert(master_df, as.is = TRUE)

A better option is to specify na.strings = c("?", "-", "") while reading the data with read.csv/read.table

score 2 · Answer 2 · answered Feb 22 '21 at 22:59

2

Perhaps you can try the code below

master_df[] <- replace(as.matrix(master_df), as.matrix(master_df) %in% c("?", "-"), NA)

which gives

> master_df
     a    b    c
1           <NA>
2 <NA> <NA> <NA>
3    1    2    1
4    2    3    2
5    3    4    3

answered Feb 22 '21 at 22:59

ThomasIsCoding

96,636
9
24
81

How to define a function that replace a vector of typos for NA's in a df?

data

2 Answers2