6

I have a data frame containing entries; It appears that these values are not treated as NA since is.na returns FALSE. I would like to convert these values to NA but could not find the way.

Marc B
  • 356,200
  • 43
  • 426
  • 500
user34771
  • 455
  • 1
  • 4
  • 15
  • I'm guessing your talking about doing this in R? Otherwise, na is pretty ambiguous... north america? not available? – Marc B Oct 06 '14 at 16:48
  • Yes sorry in R; NA stands for missing value – user34771 Oct 06 '14 at 16:55
  • 2
    Provide a sample of your data by adding the output of `dput(your.data.frame[some.rows.that.contain.such.values,])` to your question. – Roland Oct 06 '14 at 17:05
  • The results of `str(your.data.frame)` would also be useful to let us see how the columns are stored. – Greg Snow Oct 06 '14 at 17:35

3 Answers3

5

Use dfr[dfr=="<NA>"]=NA where dfr is your dataframe.

For example:

> dfr<-data.frame(A=c(1,2,"<NA>",3),B=c("a","b","c","d"))

> dfr
     A  B
1    1  a
2    2  b
3 <NA>  c
4    3  d

> is.na(dfr)
         A     B
[1,] FALSE FALSE
[2,] FALSE FALSE
[3,] FALSE FALSE
[4,] FALSE FALSE

> dfr[dfr=="<NA>"] = NA                 **key step**

> is.na(dfr)
         A     B
[1,] FALSE FALSE
[2,] FALSE FALSE
[3,]  TRUE FALSE
[4,] FALSE FALSE
Ujjwal
  • 3,088
  • 4
  • 28
  • 36
4

The two classes where this is likely to be an issue are character and factor. This should loop over a dtaframe and convert the "NA" values into true <NA>'s but just for those two classes:

make.true.NA <- function(x) if(is.character(x)||is.factor(x)){
                                  is.na(x) <- x=="NA"; x} else {
                                  x}
df[] <- lapply(df, make.true.NA)

(Untested in the absence of a data example.) The use of the form: df_name[] will attempt to retain the structure of the original dataframe which would otherwise lose its class attribute. I see that ujjwal thinks your spelling of NA has flanking "<>" characters so you might try this functions as more general:

make.true.NA <- function(x) if(is.character(x)||is.factor(x)){
                                  is.na(x) <- x %in% c("NA", "<NA>"); x} else {
                                  x}
IRTFM
  • 258,963
  • 21
  • 364
  • 487
  • Thanks for help. The problem is that I do not manage to make a reproducible example in which I obtain both NA and . The function of BondedDust allowed me to transform both NA and in true NA (they appear all TRUE with is.na(df)), but the structure of my df shows that the variables that contain entries are coded as factor and not as numeric. – user34771 Oct 06 '14 at 20:32
  • I suspect you would not want to make a conversion of all character vectors to numeric so you might want to apply this conversion just to particular columns: `dfrm[targets] <- lapply( dfrm[targets], make.true.NA) ; dfrm[targets] <- lapply( dfrm[targets], as.numeric)` – IRTFM Oct 06 '14 at 21:01
  • Yes, I have to convert to numeric, but it works only if I unlist my dataframe first. I have no idea why it appears as list, but at least it is ok. – user34771 Oct 07 '14 at 06:45
1

You can do this with the naniar package as well, using replace_with_na and associated functions.


dfr <- data.frame(A = c(1, 2, "<NA>", 3), B = c("a", "b", "c", "d"))

library(naniar)
# dev version - devtools::install_github('njtierney/naniar')
is.na(dfr)
#>          A     B
#> [1,] FALSE FALSE
#> [2,] FALSE FALSE
#> [3,] FALSE FALSE
#> [4,] FALSE FALSE

dfr %>% replace_with_na(replace = list(A = "<NA>")) %>% is.na()
#>          A     B
#> [1,] FALSE FALSE
#> [2,] FALSE FALSE
#> [3,]  TRUE FALSE
#> [4,] FALSE FALSE

# You can also specify how to do this for many variables

dfr %>% replace_with_na_all(~.x == "<NA>")
#> # A tibble: 4 x 2
#>       A     B
#>   <int> <int>
#> 1     2     1
#> 2     3     2
#> 3    NA     3
#> 4     4     4

You can read more about using replace_with_na here

Nick Tierney
  • 192
  • 1
  • 8