As you are new to R, I assume that some of the terminology is maybe a bit
confusing. So here is a little explanation regarding the if statement
.
Lets look at the if condition
:
m[,colSums(!is.na(m)) > 1, drop = FALSE]
[,1] [,2]
[1,] 1 NA
[2,] 2 NA
[3,] NA 4
[4,] NA 5
[5,] 5 NA
This is nothing that if
can work with as an if condition
has to be
boolean (evaluate to TRUE/FALSE). So why the result? Well the result of
colSums(!is.na(m))
[1] 3 1 2 0
is a vector of counts of entries that are not NA
! (= number of TRUE's in each column). Be carful as this is not the same as
colSums(m, na.rm = TRUE)
[1] 8 1 9 0
which returns a vector of sums over all five rows for each column, excluding NA
's. My guess is that the latter is what you are looking for. In any case: be aware of the difference!
By asking which of those sums is greater than 1 you do get a boolean vector
colSums(!is.na(m)) > 1
[1] TRUE FALSE TRUE FALSE
However, using that boolean vector as a criteria for selecting columns, you correctly get a matrix which is obviously not boolean:
m[,colSums(!is.na(m)) > 1]
Note: drop = FALSE
is unnecessary here as there are no dimensions to be dropped potentially. See ?[
or ?drop
. You can verify this using identical
:
identical(m[,colSums(!is.na(m)) > 1, drop = FALSE],
m[,colSums(!is.na(m)) > 1])
Now to the loop. You find tons of discussions on avoiding for loops and using the apply family of functions. I suspect you have to take some time togo through all that. Note however, that using apply
- contrary to common belief - is not necessarily superior to a for
loop in terms of speed, as it is actually just a fancy wrapper around a for
loop (check the source code!). It is, however, clearly superior in terms of code clarity as it is compact and clear about what it is doing. So do try to use apply
functions if possible!
In order to rewrite your loop it would be helpful if you could verbally
describe what you actually want to do, since I assume that what the loop
is doing right now is probably not what you want. As which()
returns the index/posistion of an element in a vector or matrix what you are basically
doing is:
indices of the i'th row that are not NA (for a given column) - mean over these indices
While this is theoretically possible, this usually doesnt make much sense. So with all my notes at hand: clearly state your problem so we can think of a fix.