0

I was looking for a way to replace NA's in my dataframe with zero's, and found a great reply here: How do I replace NA values with zeros in an R dataframe?

I used the code in aL3xa's answer to build an example matrix and found the NA's were replaced fine. However, when I came to apply the code to my own dataframe, it does not seem to work:

sum(is.na(dat.sub))

[1] 453562

dat.sub[is.na(dat.sub)] <- 0

sum(is.na(dat.sub))

[1] 453562

Can anyone suggest what I might be doing wrong?

Community
  • 1
  • 1
qwerty
  • 11
  • 1

2 Answers2

1

The command works with proper dataframes:

ddf = structure(list(A = c(1L, NA, 3L), B = c(NA, 5L, NA), C = c(5L, 
NA, 7L)), .Names = c("A", "B", "C"), class = "data.frame", row.names = c(NA, 
-3L))

str(ddf)
'data.frame':   3 obs. of  3 variables:
 $ A: num  1 0 3
 $ B: num  0 5 0
 $ C: num  5 0 7

ddf
   A  B  C
1  1 NA  5
2 NA  5 NA
3  3 NA  7

ddf[is.na(ddf)] =0
ddf
  A B C
1 1 0 5
2 0 5 0
3 3 0 7
rnso
  • 23,686
  • 25
  • 112
  • 234
0

data frames are essentially lists of vectors with the same length. If you want to change an element in the data frame, you must apply the change to one of the listed vectors. You do this by calling the data frame object (df) as if it is a matrix with named columns:

 df[is.na(df$dat.sub),"dat.sub"]<-0

where dat.sub is the name of the vector you are changing... Or you can change the values:

 df$dat.sub[is.na(df$dat.sub)]<-0

by calling the selected list, aka vector, from the data frame. Once you can do this in this way, you can start using apply and/or lapply to "apply" your replacement to every list in the data frame.

quickreaction
  • 675
  • 5
  • 17