0

I have a list, "na.list", that contains 23 data frames:

str(na.list)
List of 23
 $ YFB:'data.frame':    4383 obs. of  8 variables:
  ..$ Obs     : num [1:4383] 1 1 1 1 1 1 1 1 1 1 ...
  ..$ Day.1.AM: num [1:4383] 1 1 1 1 1 1 1 1 1 1 ...
  ..$ Day.2.AM: num [1:4383] 1 1 1 1 1 1 1 1 1 1 ...
  ..$ Day.3.AM: num [1:4383] 1 1 1 1 1 1 1 1 1 1 ...
  ..$ Day.4.AM: num [1:4383] 1 1 1 1 1 1 1 1 1 1 ...
  ..$ Day.5.AM: num [1:4383] 1 1 1 1 1 1 1 1 1 1 ...
  ..$ Day.6.AM: num [1:4383] 1 1 1 NA 1 1 1 1 1 1 ...
  ..$ Day.7.AM: num [1:4383] NA NA NA NA NA NA NA NA NA NA ...
 $ YFC:'data.frame':    4383 obs. of  8 variables:
  ..$ Obs     : num [1:4383] 1 1 1 1 1 1 1 1 1 1 ...
  ..$ Day.1.AM: num [1:4383] 1 1 1 1 1 1 1 1 1 1 ...
  ..$ Day.2.AM: num [1:4383] 1 1 1 1 1 1 1 1 1 1 ...
  ..$ Day.3.AM: num [1:4383] 1 1 1 1 1 1 1 1 1 1 ...
  ..$ Day.4.AM: num [1:4383] 1 1 1 1 1 1 1 1 1 1 ...
  ..$ Day.5.AM: num [1:4383] 1 1 1 1 NA 1 1 1 1 1 ...
  ..$ Day.6.AM: num [1:4383] NA NA NA NA NA NA NA NA 1 1 ...
  ..$ Day.7.AM: num [1:4383] NA NA NA NA NA NA NA NA NA NA ...

And so forth. What I would like to do is replace all NA values with zero. I did this using:

set.na<-function(x,y){replace(x[,y],is.na(x[,y]),0)}
na.list<-lapply(na.list,set.na,y=(1:8))

The issue is that if the first column "Obs" is NA, then all of the subsequent columns ought to be as well. So I had tried to do the following:

set.obs.na<- function(x,y){{replace(x[,y],is.na(x[,1]),0)}}
na.list<-lapply(all.dat,set.obs.na,y=(1:8))
set.na<-function(x,y){replace(x[,y],is.na(x[,y]),0)}
na.list<-lapply(na.list,set.na,y=(2:8))

Where the idea was that the first function would set the 0 values based on the "obs" column first, and then evaluate the rest of the columns. The "set.obs.na" function doesn't work, and returns the error:

Error in [<-.data.frame(*tmp*, list, value = 0) : attempt to select more than one element

I'm not quite sure how best to achieve the result I want, so any suggestions would be greatly appreciated.

As suggested below, I will provide a working example to illustrate my issue:

I have a list of data frames:

df.list<- list(df1 = data.frame(x=c(1,NA, 1,NA), y = c(NA,1,1,1), z=c(1,1,1,NA)), 
 df2 = data.frame(x = c(NA, NA, 1,1), y=c(1,1,1,1), z=c(NA,1,NA,1)))

I wish to replace the NA values with zero. However I would first like to use the x column NAs to determine the values in the y, and z columns. So if the value is NA in the x column, then no matter what the value is in the y,z column it will be overwritten with zero. So in the above example, the y and z columns in df1 would have the 2 and 3 values oevrwritten with zero. I tried doing this with the following:

set.obs.na<- function(a,b){{replace(a[,b],is.na(a[,1]),0)}}
df.list<-lapply(df.list,set.obs.na,b=(1:3))

But I get the following error:

Error in [<-.data.frame(*tmp*, list, value = 0) : attempt to select more than one element

Is there a simple way to replace the values in the y and z column with 0 if the corresponding x value is NA?

  • 1
    You could improve your question. Please read [how to provide minimal reproducible examples in R](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example#answer-5963610). Then perhaps edit & improve it accordingly. A good post usually provides minimal input data, the desired output data & code tries - all copy-paste-run'able in a new/clean R session. To replace `NA`s in a list of data frames, you could use `lst <- list(df1 = data.frame(x=c(1,NA, 3), y = NA), df2 = data.frame(a = c(NA, NA, 10)));lapply(lst, function(df) { df[is.na(df)] <- 0; df})`. – lukeA Oct 03 '16 at 20:05
  • Thanks. I will remember to produce an example in the way you suggested for next time. – Paul Greeley Oct 04 '16 at 11:30
  • Replacing the NA values in each column with zero is not a problem for me, rather it is using the first column to determine subsequent columns that is causing an issue. I've updated my example as suggested. – Paul Greeley Oct 04 '16 at 11:59
  • You are getting the error because you pass on a data frame to the first argument of replace instead of a vector. Maybe `set.obs.na<- function(a,b) setNames(as.data.frame(sapply(b, function(x) replace(a[, x], is.na(a[, 1]), 0))), names(a))` or sth? – lukeA Oct 04 '16 at 12:26
  • @lukeA Thanks that works for my example! I will now try it on my actual data set. – Paul Greeley Oct 04 '16 at 12:59

1 Answers1

0

If we need to change the position in all other columns based on the NA value position in first column, we can change the 'set.na' to

set.na <- function(x) replace(x[-1], is.na(x[1]), 0)
new.list <- lapply(na.list, set.na)
akrun
  • 874,273
  • 37
  • 540
  • 662