1

I would like to traverse through rows of a matrix and perform some operations on data entries based on a condition.

Below is my code

m = matrix(c(1,2,NA,NA,5,NA,NA,1,NA,NA,NA,NA,4,5,NA,NA,NA,NA,NA,NA), nrow = 5, ncol = 4)
if (m[,colSums(!is.na(m)) > 1, drop = FALSE]){
        for(i in 1:4){
              a = which(m[i,] != "NA") - mean(which(!is.na(m[i,])))
                for(j in 2:5){
                       b = which(m[j,] != "NA") - mean(which(!is.na(m[j,])))
                       prod(a,b)
                }
        }
}

I get a warning message as below in my "if" condition

Warning message:
In if (m[, colSums(!is.na(m)) > 1, drop = FALSE]) { :
  the condition has length > 1 and only the first element will be used

I know it returns a vector and I should be using ifelse block. How to incorporate for loops inside ifelse block? It seems to be a basic question, I am new to R.

Pratheek16
  • 23
  • 5
  • 1
    Try to use "apply" series function instead of "if" or "ifelse". – Patric Dec 08 '15 at 07:09
  • BTW, you can use "na.rm=TRUE" parameter in lots of functions to ignore "NA". – Patric Dec 08 '15 at 07:14
  • @Patric - Can you please guide me how i can use "apply" function to check the condition? – Pratheek16 Dec 08 '15 at 07:52
  • could you try to re-organize your code first? Such as, "a = which(m[i,] != "NA") - mean(which(!is.na(m[i,])))" will be not changed in the inner loop "j" while the "b" also doesn't change by loop "i", right? So, it's not necessary in the loop body. – Patric Dec 08 '15 at 08:07
  • To be honest, I was using R for a few months before I started to use the `apply` family of functions. I'd come from VBA where everything is done with for loops. But take some time to learn about `apply` etc. and you won't look back - much faster and simpler code. See for starters: http://stackoverflow.com/questions/3505701/r-grouping-functions-sapply-vs-lapply-vs-apply-vs-tapply-vs-by-vs-aggrega – CJB Dec 08 '15 at 09:27
  • 1
    I think this warning message is not you're main problem. `m[, colSums(!is.na(m)) > 1, drop = FALSE]` doesn't return a boolean. Could you please describe what you're trying to achieve ? – bluefish Dec 08 '15 at 10:50
  • @bluefish - I will need to find out the columns that has more than one data entry other than "NA" and then proceed to the calculation as shown inside the loop. – Pratheek16 Dec 08 '15 at 17:25
  • @Pratheek16 - Hi, sorry I still don't get what you're trying to archive. [link](http://www.catb.org/esr/faqs/smart-questions.html#goal) If you just want to loop over the result of `m[, colSums(!is.na(m)) > 1]` why not do `m_new <- m[, colSums(!is.na(m)) > 1]` and loop over `m_new` ? – bluefish Dec 08 '15 at 21:06
  • @Bluefish - Hi, I tried the above and I am not getting any error. I see the result is 0 i.e prod(a,b). Ultimately what I need is - taking the product result from each column and then summing them up as the final output. I do not have to use another for loop for columns right? – Pratheek16 Dec 09 '15 at 03:19

2 Answers2

0

Based on your description, you want to check the number of non NA in matrix by column and then do something dependent on this results (that why you need "if"/"ifelse" statement). So, you can implemented as below, and write inner loops in a specific function.

yourFunc <- function(x, data) {
 # do what your want / your loops on "data"
 # sample, you can check the result in here
 if(x > 1)  1
 else       0
}

m = matrix(c(1,2,NA,NA,5,NA,NA,1,NA,NA,NA,NA,4,5,NA,NA,NA,NA,NA,NA), nrow = 5, ncol = 4)
# use "apply" series function in here
sapply(colSums(!is.na(m)), yourFunc, data=m)
#[1] 1 0 1 0

Actually, I think you need to re-organize your problem and optimize the code, the "ifelse with for loop" may be totally unnecessary.

Patric
  • 2,063
  • 17
  • 18
  • I don't want to sum the columns. I want to see which columns has more than one data entry other than "NA". Then perform operations for those columns only. – Pratheek16 Dec 08 '15 at 17:31
  • @Pratheek16 changed the code, check if it's what you want. – Patric Dec 08 '15 at 23:25
  • That is a nice way to put my looping logic in a function. So, when I include the looping logic in a function, will the sapply apply to each column ? – Pratheek16 Dec 09 '15 at 20:15
  • @Pratheek16, yes, but you can do minor change to only let TRUE parts go to function. – Patric Dec 10 '15 at 00:00
0

As you are new to R, I assume that some of the terminology is maybe a bit confusing. So here is a little explanation regarding the if statement.

Lets look at the if condition:

m[,colSums(!is.na(m)) > 1, drop = FALSE]

      [,1] [,2]
[1,]    1   NA
[2,]    2   NA
[3,]   NA    4
[4,]   NA    5
[5,]    5   NA

This is nothing that if can work with as an if condition has to be boolean (evaluate to TRUE/FALSE). So why the result? Well the result of

colSums(!is.na(m))
[1] 3 1 2 0

is a vector of counts of entries that are not NA! (= number of TRUE's in each column). Be carful as this is not the same as

colSums(m, na.rm = TRUE)
[1] 8 1 9 0

which returns a vector of sums over all five rows for each column, excluding NA's. My guess is that the latter is what you are looking for. In any case: be aware of the difference!
By asking which of those sums is greater than 1 you do get a boolean vector

colSums(!is.na(m)) > 1
[1]  TRUE FALSE  TRUE FALSE

However, using that boolean vector as a criteria for selecting columns, you correctly get a matrix which is obviously not boolean:

m[,colSums(!is.na(m)) > 1]

Note: drop = FALSE is unnecessary here as there are no dimensions to be dropped potentially. See ?[ or ?drop. You can verify this using identical:

identical(m[,colSums(!is.na(m)) > 1, drop = FALSE],
          m[,colSums(!is.na(m)) > 1])

Now to the loop. You find tons of discussions on avoiding for loops and using the apply family of functions. I suspect you have to take some time togo through all that. Note however, that using apply - contrary to common belief - is not necessarily superior to a for loop in terms of speed, as it is actually just a fancy wrapper around a for loop (check the source code!). It is, however, clearly superior in terms of code clarity as it is compact and clear about what it is doing. So do try to use apply functions if possible!

In order to rewrite your loop it would be helpful if you could verbally describe what you actually want to do, since I assume that what the loop is doing right now is probably not what you want. As which() returns the index/posistion of an element in a vector or matrix what you are basically doing is:

indices of the i'th row that are not NA (for a given column) - mean over these indices

While this is theoretically possible, this usually doesnt make much sense. So with all my notes at hand: clearly state your problem so we can think of a fix.

Manuel R
  • 3,976
  • 4
  • 28
  • 41
  • Manuel - Thanks for the explanation! I do not want to sum the columns, I just want the prior - process only those columns that has more than one data entry other than "NA". When the above satisfies, I will compare each row with another for those columns. Inside the loop, I would like to take the (ith element - mean(ith row of the resultant matrix)) * (jth element - mean(jth row of the resultant matrix). Resultant matrix is the one after I achieve from m[,colSums(!is.na(m)) > 1, drop = FALSE] – Pratheek16 Dec 08 '15 at 17:45
  • The multiplication part is still unclear to me. What is the result you expect? Just one number (this is what `prod` would give you) or do you expect a matrix where each centered element (`(ith element - mean(ith row of the resultant matrix))`) with `i = 1,...,4` is multiplied with all other centered elements of all other `j = 1,..., 4` rows? Please be more precise as to what you want and what kind of result you expect. A single number, a vector, a matrix? – Manuel R Dec 09 '15 at 13:24
  • Manuel - it will be a matrix with 1X2. So, for the above example I will have two elements in them - i.e. running the loop ((ith element - mean(ith row of the resultant matrix))) with i = 1,...,4 is multiplied with all other centered elements of all other j = 2,..., 5 rows) for each column. Next, I would like to have the sum(results from the two columns), which would be a single number. I hope it's clear now. – Pratheek16 Dec 09 '15 at 15:40