I'm trying to eliminate columns in a large data set if there are too many NA
values in the column. There are 1007 variables in the data set. I came up with the following code but I don't think it is working.
> for(i in 1:1007){
+ if (length(which(is.na(train3[i])=="TRUE"))>1955) train3[i]<-NULL
+ else train3[i]<-train3[i]
+ }
Error in which(is.na(train3[i]) == "TRUE") :
error in evaluating the argument 'x' in selecting a method for function 'which': Error in `[.data.frame`(train3, i) : undefined columns selected
So I'm trying to eliminate the columns which has more than 1955 NAs. Will there be a way to make this work?