I have been working on this project and I am stuck in the following:
I have 7 columns on which over 30% of the rows are NAs. All my columns are numeric, by the way.
On these High Missing Values Columns I want to create 4 new columns base on the values of the these columns' quantiles.
1st column- input 1 in rows which contains data; 0 otherwise. 2nd column- input 1 in rows below the first quantile; 0 otherwise. 3rd column- input 1 in rows that are in the 2nd quantile range; 0 otherwise. 4th column- input 1 in rows that are above the 3rd quantile; 0 otherwise.
I got the first column. But the rest, based on the quantiles' threshold value has been a challenge. Here is what I have so far...
My next 3 columns are base on just 3 quantiles: 33.33333%, 66.66667% and 100%
quantile(High_NAS_set1$EFX, prob=c(33/99,66/99,99/99),na.rm=TRUE)
#1st column: assign 1 for a row that contains data; 0 otherwise
New.EFX_<-High_NAS_set1$EFX #creating a new column
New.EFX[!is.na(New.EFX)]<-1
New.EFX[is.na(New.EFX)]<-0
#2nd Column:assign 1 in rows below the first quantile; 0 otherwise
New.EFX2_<-High_NAS_set1$EFX #creating a new column
quant<-quantile(New2.EFX_Emp,probs=33/99,na.rm=TRUE)
which(New2.EFX_Emp_Total<=quant)<-1 # assign 1 for rows which indexes are below quant
which(New2.EFX_Emp_Total!=quant)<-0
The last 2 lines are giving me an error:
Error in which(New2.EFX_Emp_Total <= quant) <- 1 :
could not find function "which<-"
Any help will be really appreciated. Thanks, Jean