0

I've been using R for data manipulating for quite a long time, and recently I found the conditional statement in R is not right based on my data results. Here is an example of my data set:

tail(ncmi[which(ncmi$type=='Above'),])
               p      freq freq.pred  pred.lwr pred.upr  type
OTU328  0.0008791327    1         1 0.7224672        1 Above
OTU81   0.0008872229    1         1 0.7224672        1 Above
OTU2322 0.0008953131    1         1 0.7224672        1 Above
OTU55   0.0009087967    1         1 0.7224672        1 Above
OTU6952 0.0009141902    1         1 0.7224672        1 Above
OTU5350 0.0009249771    1         1 0.7224672        1 Above

If the number in freq is bigger than that of pred.upr, then the type is set as 'Above', as you can see, none of these results in freq is bigger than pred.upr, while the data type is set as 'Above'.

The conditional statement in my code is as below:

ncmi$type <- ''
for (k in 1:nrow(ncmi)) {
  if(ncmi$freq[k]>ncmi$pred.upr[k]){
    ncmi$type[k] <- 'Above'
  }else if(ncmi$freq[k]<ncmi$pred.lwr[k]){
    ncmi$type[k] <- 'Below'
  }else{
    ncmi$type[k] <- 'Neutral'
  }
}

Why would this happen?

Feng Zhou
  • 21
  • 3
  • Have you tried any debugging, like printing the values of `ncmi$freq[k]` and `ncmi$pred.upr[k]`? – Guy Incognito Jun 15 '20 at 08:27
  • *as you can see, none of these results in freq is bigger than pred.upr* This is not clear because of rounding to display the data on screen. For your console input: `data.frame(x=rep(0.99999999, 5), y=1.00000001)` or `D <- data.frame(x=rep(0.99999999, 5), y=1.00000001); 1 - D[1,1]; D[1,2] - 1 ` – jogo Jun 15 '20 at 08:27
  • 1
    Possibly relevant: https://stackoverflow.com/questions/9508518/why-are-these-numbers-not-equal – Hong Ooi Jun 15 '20 at 08:28
  • I'm going to guess (since you didn't provide the code) that `pred.upr` is the upper prediction interval endpoint, obtained via an inverse logistic transform. If so, it will be always strictly less than 1 except in extreme cases. Note that asking whether the observed value for a binary response is within the pred interval doesn't make much sense; that approach assumes a continuous response. – Hong Ooi Jun 15 '20 at 08:32
  • You do not need a for-loop: `ncmi$type <- 'Neutral'; ncmi$type[ncmi$freq > ncmi$pred.upr] <- 'Above'; ncmi$type[ncmi$freq < ncmi$pred.upr] <- 'Below'` – jogo Jun 15 '20 at 08:38
  • Yes, I have tried that, @GuyIncognito. But the output is all 1:```c(ncmi$freq[which(rownames(ncmi)=='OTU328')],ncmi$pred.upr[which(rownames(ncmi)=='OTU328')]) [1] 1 1``` – Feng Zhou Jun 15 '20 at 08:55
  • Perhaps you are right. @HongOoi – Feng Zhou Jun 15 '20 at 08:59
  • Thanks @jogo, these code is written a long time a ago, I didn't fully understand the dataframe quite well that time. – Feng Zhou Jun 15 '20 at 09:01

0 Answers0