0

I have a data set with several column, for each column i want to find a threshold value to make the NA count between 1010-1020. Below is the way i tried coding. Here is the example for the data.

X1       X2      X3
1.51    0.00    0.00
0.31    3.90    0.00
0.64    13.64   0.00
0.26    9.66    0.00
0.36    0.04    0.00
0.51    0.03    0.00
0.30    0.08    0.02
0.01    0.20    0.04
0.02    0.03    0.00
0.00    0.47    0.00
0.00    1.44    5.54
0.00    2.68    0.74
0.03    0.68    5.49
1.72    0.08    1.54

   threshold=seq(0.5,by=0.1,5)
   for (j in threshold){
      for (i in 1:3){
      data[,i]=ifelse(data[,i]> j,data[,i],NA)
      if((sum(is.na(data[,i]))==range(2,4)) {break
      }
      }}
Roland
  • 127,288
  • 10
  • 191
  • 288
NEXUSAG
  • 53
  • 8
  • 3
    Please read the info about [how to ask](http://stackoverflow.com/help/how-to-ask) and how to give a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610). This will make it much easier for others to help you. – Jaap Jan 28 '16 at 15:39
  • You can't test if a value is in a range using `==`. You need to test > min and < max. – Roland Jan 28 '16 at 16:02

1 Answers1

0

Ok, here's how I'd do it.

threshold <- rep(NA,50)

for (i in 3:50){

  # Find the number of current NAs
  nNA <- sum(is.na(pred[,i]))

  # Find the 1015th smallest value (minus the number of NAs you already have)
  threshold[i] <- sort(pred[,i])[1015 - nNA]
  pred[pred[,i] < threshold[i],i] <- NA
}

Edit: Changed to fit all new requirements.

slamballais
  • 3,161
  • 3
  • 18
  • 29
  • thanks you very much i was able to calculate the threshold for each column... I also wanted to use it and update the data pred – NEXUSAG Jan 28 '16 at 16:13
  • Edited so that thresholds are stored in `threshold` and `pred` is updated. Some things could be done a bit cleaner, but at this point I had to edit it so often that I'll leave the perfection up to you :) – slamballais Jan 28 '16 at 16:21