0

I edited this question(hopefully as requested)

I need to check every cell of a data.frame, if it's value is in certain range. I am very new to apply and need to work on understanding it.

I have 2 data.frames:

  • blood_df: 158 columns,

  • stat_df: statistics for every col of blood_df

Attached is a minimal example for explanation:

so far I got this, but it's calculating the same result for every cell.

c0 <- c(0,0,0,0)
c1 <- c(1,2,3,4)
c2 <- c(5,6,7,8)
c3 <- c(9,10,11,12)
c4 <- c(13,14,15,16)

blood_df <- data.frame(c0,c1,c2,c3,c4)
stat_df <- data.frame(matrix(ncol = 5, nrow = 6))
colnames(stat_df) <- colnames(blood_df)
rownames(stat_df) <- c("Mean","3*sd","sum", "Mean2","-3*sd","sum2" )

stat_df[1,2:5] <-apply(blood_df[,2:5], 2,  mean, na.rm = TRUE)
stat_df[2,2:5] <-apply(blood_df[1:4,2:5], 2, function(x)  3*sd(x,na.rm=TRUE))
stat_df[3,] <-colSums(stat_df[1:2,])
stat_df[4,2:5] <-apply(blood_df[,2:5], 2,  mean, na.rm = TRUE)
stat_df[5,2:5] <-apply(blood_df[1:4,2:5], 2, function(x) -3*sd(x,na.rm=TRUE))
stat_df[6,] <-colSums(stat_df[4:5,])

blood_df:
##   c0 c1 c2 c3 c4
## 1  0  1  5  9 13
## 2  0  2  6 10 14
## 3  0  3  7 11 15
## 4  0  4  8 12 16

stat_df:
##       c0        c1        c2        c3        c4
## Mean  NA  2.500000  6.500000 10.500000 14.500000
## 3*sd  NA  3.872983  3.872983  3.872983  3.872983
## sum   NA  6.372983 10.372983 14.372983 18.372983
## Mean2 NA  2.500000  6.500000 10.500000 14.500000
## -3*sd NA -3.872983 -3.872983 -3.872983 -3.872983
## sum2  NA -1.372983  2.627017  6.627017 10.627017 

The part that is not working as I need it:

blood_df[1:4,2:5] <- apply(blood_df[,2:5],2,  function(x) 
                   (ifelse((x > (stat_df[3,2:5]))|| 
                   (x < (stat_df[6,2:5])), NA, x)))

So far it gives me:

blood_df:
##   c0 c1 c2 c3 c4
## 1  0  1  1  1  1
## 2  0  5  5  5  5
## 3  0 NA NA NA NA
## 4  0 NA NA NA NA

What I'd like to get is:(to check if every value is in between a certain range)

blood_df:
##   c0 c1 c2 c3 c4
## 1  0  1  5  9 13
## 2  0  2  6 10 14
## 3  0  3  7 11 15
## 4  0  4  8 12 16

If it's not in the range, the value should change to NA.

Thanks!

Rivka
  • 307
  • 1
  • 5
  • 19
  • 1
    Could you please provide us with a reproducible example? This would make it easier to answer the question. :) – Daniel_Kuehn Jan 22 '17 at 21:45
  • 1
    And please also add an example of the desired output. – ekstroem Jan 22 '17 at 22:14
  • 2
    [Info on how to give a reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610) – Jaap Jan 23 '17 at 09:25

1 Answers1

1

Try mapply:

column_range = 2:5
blood_df[, column_range] = mapply(function(blood, stat){
        ifelse((blood > stat[3]) | (blood < stat[6]), NA, blood)
    },
    blood_df[, column_range],
    stat_df[, column_range],
    SIMPLIFY = FALSE
)
Gregory Demin
  • 4,596
  • 2
  • 20
  • 20
  • Actually you may want `SIMPLIFY` arg as `TRUE` (the default) to bind back to dataframe columns. By itself this is equivalent to `Map()` and returns a list of vectors. – Parfait Jan 23 '17 at 00:43
  • 2
    @Parfait Resulting list will be mapped to original columns. Reproducible example: `iris[, 1:4] = lapply(iris[, 1:4], scale)`. And I think it is more robust and faster method than conversation to matrix and binding back to data.frame. – Gregory Demin Jan 23 '17 at 09:39