49

I have a vector with NA values that I would like to replace by a new factor level NA.

a = as.factor(as.character(c(1, 1, 2, 2, 3, NA)))
a
[1] 1    1    2    2    3    <NA>
Levels: 1 2 3

This works, but it seems like a strange way to do it.

a = as.factor(ifelse(is.na(a), "NA", a))
class(a)
[1] "factor"

This is the expected output:

a
[1] 1  1  2  2  3  NA
Levels: 1 2 3 NA
Rich Scriven
  • 97,041
  • 11
  • 181
  • 245
marbel
  • 7,560
  • 6
  • 49
  • 68
  • 1
    Do you want to keep a `NA` or a `"NA"` in the levels and in the vector? Perhaps, instead of `as.character` you might have wanted `paste`? – alexis_laz Nov 28 '14 at 21:20

3 Answers3

66

You can use addNA().

x <- c(1, 1, 2, 2, 3, NA)
addNA(x)
# [1] 1    1    2    2    3    <NA>
# Levels: 1 2 3 <NA>

This is basically a convenience function for factoring with exclude = NULL. From help(factor) -

addNA modifies a factor by turning NA into an extra level (so that NA values are counted in tables, for instance).

So another reason this is nice is because if you already have a factor f, you can use addNA() to quickly add NA as a factor level without changing f. As mentioned in the documentation, this is handy for tables. It also reads nicely.

Rich Scriven
  • 97,041
  • 11
  • 181
  • 245
  • I used addNA. How can I get it in strings as the other levels? Cause my levels look like this "1" "2" "3" NA and i want it to be "1" "2" "3" "NA" – HonestRlover.not. Dec 19 '20 at 19:50
26

You can add the NA as a level and change the level name to something more explicit than <NA> using fct_explicit_na from package forcats.

library(forcats)

By default you get the new level as (Missing):

fct_explicit_na(a)

[1] 1         1         2         2         3         (Missing)
Levels: 1 2 3 (Missing)

You can set it to something else:

fct_explicit_na(a, "unknown")

[1] 1       1       2       2       3       unknown
Levels: 1 2 3 unknown
aosmith
  • 34,856
  • 9
  • 84
  • 118
  • 2
    This function is now superseded and one should use `fct_na_value_to_level` instead (from `forcats 1.0.0`) – Maël Jan 31 '23 at 10:12
20

Set the exclude argument to NULL to include NAs as levels (and use factor instead of as.factor. Does the same thing and has more arguments to set):

a = factor(as.character(c(1, 1, 2, 2, 3, NA)), exclude = NULL)

> a
[1] 1    1    2    2    3    <NA>
Levels: 1 2 3 <NA>
LyzandeR
  • 37,047
  • 12
  • 77
  • 87
  • 2
    That's the better option when you want to order factor levels for plotting with `ggplot` :) – tjebo Jul 22 '19 at 10:29