Convert all empty & fields marked with "N/A" as NA in R

Question

I am new to Machine Learning & R, so my question is a pretty basic one:

I have imported a dataset and performed some modifications and stored the final output in a dataframe named df_final.

Now I would like to replace all the empty fields and fields with "N/A", "n/a" as NA, so that I could use the inbuilt na libraries in R.

Any help in this context would be highly appreciated.

Cheers! Vivek

https://stackoverflow.com/questions/4862178/remove-rows-with-all-or-some-nas-missing-values-in-data-frame — Logica, Feb 25 '20 at 06:34
How were empty fields, "N/A", and "n/a" generated? If they are strings in the original data before you imported, you can deal with them by assigning `na.strings = c("", "N/A", "n/a")` in `read.table`. — Darren Tsai, Feb 25 '20 at 06:49
Agree with @DarrenTsai: This should be solved during data import not afterwards. — Roland, Feb 25 '20 at 07:11

score 2 · Answer 1 · answered Feb 25 '20 at 10:00

I agree that the problem is best solved at read-in, by setting na.strings = c("", "N/A", "n/a") in read.table, as suggested by @Darren Tsai. If that's no longer an option because you've processed the data already and, as I suspect, you do not want to keep only complete cases, as suggested by @Rui Barradas, then the issue can be addressed this way:

DATA:

df_final <- data.frame(v1 = c(1, "N/A", 2, "n/a", "", 3),
                       v2 = c("a", "", "b", "c", "d", "N/A"))
df_final
   v1  v2
1   1   a
2 N/A    
3   2   b
4 n/a   c
5       d
6   3 N/A

SOLUTION:

To introduce NA into empty fields, you can do:

df_final[df_final==""] <- NA
df_final
    v1   v2
1    1    a
2  N/A <NA>
3    2    b
4  n/a    c
5 <NA>    d
6    3  N/A

To change the other values into NA, you can use lapply and a function:

df_final[,1:2] <- lapply(df_final[,1:2], function(x) gsub("N/A|n/a", NA, x))
df_final
    v1   v2
1    1    a
2 <NA> <NA>
3    2    b
4 <NA>    c
5 <NA>    d
6    3 <NA>

score 1 · Answer 2 · answered Feb 25 '20 at 06:46

This is a two steps solution.

Replace the bad values by real NA values.
Keep the complete.cases.

In base R:

is.na(df1) <- sapply(df1, function(x) x %in% c("", "N/A", "n/a"))
df_final <- df1[complete.cases(df1), , drop = FALSE]
df_final
#  x y
#1 a u
#3 d v

Data creation code.

df1 <- data.frame(x = c("a", "N/A", "d", "n/a", ""),
                  y = c("u", "", "v", "x", "y"))

Convert all empty & fields marked with "N/A" as NA in R

2 Answers2