0

I have a column of type factor. Some of the values in the columns are NA values. How do I convert all these NA values to a new level, say 0, or "OriginallyNA" or something.

I was able to convert NAs to 0 for columns of class numeric, but haven't been able to do it for columns of class factor.

My data

> col1 = c(1,2,3,4,NA)
> col2 = c(6,7,NA,NA,8)
> df = data.frame(col1,col2)
> df
  col1 col2
1    1    6
2    2    7
3    3   NA
4    4   NA
5   NA    8
> df$col2 = as.factor(df$col2)
> class(df$col1)
[1] "numeric"
> class(df$col2)
[1] "factor"

Trying to convert the NA values to another level, say 0

> df[is.na(df)] = 0
Warning message:
In `[<-.factor`(`*tmp*`, thisvar, value = 0) :
  invalid factor level, NA generated
> df
  col1 col2
1    1    6
2    2    7
3    3 <NA>
4    4 <NA>
5    0    8
> levels(df$col2)
[1] "6" "7" "8"

Do I have to convert the factor column to numeric, change NA values to 0, and then convert it back to factor after conversion, as follows. Is there a better way?

> df$col2 = as.numeric(df$col2)
> df
  col1 col2
1    1    1
2    2    2
3    3   NA
4    4   NA
5    0    3
> df[is.na(df)] = 0
> df
  col1 col2
1    1    1
2    2    2
3    3    0
4    4    0
5    0    3
> df$col2 = as.factor(df$col2)
> df
  col1 col2
1    1    1
2    2    2
3    3    0
4    4    0
5    0    3
IAMTubby
  • 1,627
  • 4
  • 28
  • 40
  • Why would you want to convert a numeric column to a factor? What are going to do with it next? Also, `0`, is a meaningful number, it is usually a bad practice to put it instead of `NA`. – David Arenburg Mar 08 '15 at 08:37
  • 2
    Eitherway, I would do something like `df$col2 <- factor(with(df, replace(col2, is.na(col2), 0)))` – David Arenburg Mar 08 '15 at 08:46
  • 1
    You could try `df$col2 <- addNA(df$col2)` as discussed [here](http://stackoverflow.com/questions/27195956/convert-na-in-factor-to-a-new-level) – JACKY88 Mar 08 '15 at 12:46
  • @PatrickLi, thanks. Can you please add this as an answer, so I can accept? – IAMTubby Mar 08 '15 at 21:07

2 Answers2

1

The warning :

Warning message:
In `[<-.factor`(`*tmp*`, thisvar, value = 0) :
  invalid factor level, NA generated

means that you try to assign a factor column with a value not existing in his levels. You should first add the missings levels before assigning it like you have tried to do using df[is.na(df)] <- 0.

Here a helper function that you do this for any factor column in your data.frame:

re_levels <- 
  function(col) {
    if (is.factor(col))  levels(col) <- c(levels(col), "0")
  col
  }

Then you apply it to your data.frame and change missing levels by 0 :

df <- sapply(df,re_levels)
df[is.na(df)] <-  0

#       col1 col2
# [1,]    1    1
# [2,]    2    2
# [3,]    3    0
# [4,]    4    0
# [5,]    0    3
agstudy
  • 119,832
  • 17
  • 199
  • 261
  • thanks I did try it out. But at the end of the last command, my col2 is not a factor anymore? I want it to remain a factor, So do I just do a df$col2 = as.factor(df$col2) ? Is converting the factor column to numeric, change NA values to 0, and then convert it back to factor after conversion bad? – IAMTubby Mar 08 '15 at 11:03
1

If you use

df$col2 <- addNA(df$col2)

you will get a new level 'NA' to the factor.

JACKY88
  • 3,391
  • 5
  • 32
  • 48