Replace NA zero dataframe: code not working

Question

I was looking for a way to replace NA's in my dataframe with zero's, and found a great reply here: How do I replace NA values with zeros in an R dataframe?

I used the code in aL3xa's answer to build an example matrix and found the NA's were replaced fine. However, when I came to apply the code to my own dataframe, it does not seem to work:

sum(is.na(dat.sub))

[1] 453562

dat.sub[is.na(dat.sub)] <- 0

sum(is.na(dat.sub))

[1] 453562

Can anyone suggest what I might be doing wrong?

You need to provide a [reproducible example](http://stackoverflow.com/a/5963610/1412059). — Roland, Aug 06 '14 at 13:09
It will not work correctly if `0` can't be coerced to `dat.sub` columns classes. Are all of your columns `factors`? — celiomsj, Aug 06 '14 at 13:37
@celiomsj OK but there would have been some warning message? — agenis, Aug 06 '14 at 13:47
@qwerty OK, so could you execute dat.sub <- apply(dat.sub,2,as.factor) on your data and then try again and tell us what you get? — agenis, Aug 06 '14 at 14:04
@agenis it works fine now, many thanks for your help! (Although I probably should have thought of this :( ) — qwerty, Aug 06 '14 at 14:54

score 1 · Answer 1 · answered Aug 06 '14 at 13:46

The command works with proper dataframes:

ddf = structure(list(A = c(1L, NA, 3L), B = c(NA, 5L, NA), C = c(5L, 
NA, 7L)), .Names = c("A", "B", "C"), class = "data.frame", row.names = c(NA, 
-3L))

str(ddf)
'data.frame':   3 obs. of  3 variables:
 $ A: num  1 0 3
 $ B: num  0 5 0
 $ C: num  5 0 7

ddf
   A  B  C
1  1 NA  5
2 NA  5 NA
3  3 NA  7

ddf[is.na(ddf)] =0
ddf
  A B C
1 1 0 5
2 0 5 0
3 3 0 7

score 0 · Answer 2 · answered Aug 06 '14 at 13:30

data frames are essentially lists of vectors with the same length. If you want to change an element in the data frame, you must apply the change to one of the listed vectors. You do this by calling the data frame object (df) as if it is a matrix with named columns:

 df[is.na(df$dat.sub),"dat.sub"]<-0

where dat.sub is the name of the vector you are changing... Or you can change the values:

 df$dat.sub[is.na(df$dat.sub)]<-0

by calling the selected list, aka vector, from the data frame. Once you can do this in this way, you can start using apply and/or lapply to "apply" your replacement to every list in the data frame.

This is not entirely correct, something like `df[is.na(df)] = 0` works pretty well. — martin, Aug 06 '14 at 13:36

Replace NA zero dataframe: code not working

2 Answers2