2

I am new to Machine Learning & R, so my question is a pretty basic one:

I have imported a dataset and performed some modifications and stored the final output in a dataframe named df_final.

Now I would like to replace all the empty fields and fields with "N/A", "n/a" as NA, so that I could use the inbuilt na libraries in R.

Any help in this context would be highly appreciated.

Cheers! Vivek

Logica
  • 977
  • 4
  • 16
imvivran
  • 31
  • 1
  • 3
  • https://stackoverflow.com/questions/4862178/remove-rows-with-all-or-some-nas-missing-values-in-data-frame – Logica Feb 25 '20 at 06:34
  • Check the above link – Logica Feb 25 '20 at 06:34
  • 1
    How were empty fields, "N/A", and "n/a" generated? If they are strings in the original data before you imported, you can deal with them by assigning `na.strings = c("", "N/A", "n/a")` in `read.table`. – Darren Tsai Feb 25 '20 at 06:49
  • Agree with @DarrenTsai: This should be solved during data import not afterwards. – Roland Feb 25 '20 at 07:11

2 Answers2

2

I agree that the problem is best solved at read-in, by setting na.strings = c("", "N/A", "n/a") in read.table, as suggested by @Darren Tsai. If that's no longer an option because you've processed the data already and, as I suspect, you do not want to keep only complete cases, as suggested by @Rui Barradas, then the issue can be addressed this way:

DATA:

df_final <- data.frame(v1 = c(1, "N/A", 2, "n/a", "", 3),
                       v2 = c("a", "", "b", "c", "d", "N/A"))
df_final
   v1  v2
1   1   a
2 N/A    
3   2   b
4 n/a   c
5       d
6   3 N/A

SOLUTION:

To introduce NA into empty fields, you can do:

df_final[df_final==""] <- NA
df_final
    v1   v2
1    1    a
2  N/A <NA>
3    2    b
4  n/a    c
5 <NA>    d
6    3  N/A

To change the other values into NA, you can use lapply and a function:

df_final[,1:2] <- lapply(df_final[,1:2], function(x) gsub("N/A|n/a", NA, x))
df_final
    v1   v2
1    1    a
2 <NA> <NA>
3    2    b
4 <NA>    c
5 <NA>    d
6    3 <NA>
Chris Ruehlemann
  • 20,321
  • 4
  • 12
  • 34
1

This is a two steps solution.

  1. Replace the bad values by real NA values.
  2. Keep the complete.cases.

In base R:

is.na(df1) <- sapply(df1, function(x) x %in% c("", "N/A", "n/a"))
df_final <- df1[complete.cases(df1), , drop = FALSE]
df_final
#  x y
#1 a u
#3 d v

Data creation code.

df1 <- data.frame(x = c("a", "N/A", "d", "n/a", ""),
                  y = c("u", "", "v", "x", "y"))
Rui Barradas
  • 70,273
  • 8
  • 34
  • 66