3

I have a numeric matrix of 30,000 rows and 3 columns. I would like to generate a simple PASS/FAIL vector (or factor) based on the 3 values in each row of the matrix. I would like to apply the following logic:

If all 3 values in row > 3, enter PASS, else FAIL.

I know how to do this with a for loop, but how could I do it faster? I have dozens of these matrices... Thank you!

as.matrix(rbind(c(129,129,120),c(135,97,96),c(0,0,0),c(39,4,2)))

desired output: PASS, PASS, FAIL, FAIL

reviewer3
  • 243
  • 3
  • 11
  • I want to thank everyone that answered here! and I'm sorry I wasn't more precise about the PASS/FAIL - I understand that some solutions were written just to make the output fit that req. Thank you for taking the time to answer, I learned about vectorized functions, logical vectors, ifelse and apply. Going on stackexchange is always a good idea. Thank you ALL! – reviewer3 Oct 22 '13 at 22:37

5 Answers5

5

Use all and apply (though apply is using it's own loops).

m <- as.matrix(rbind(c(129,129,120),c(135,97,96),c(0,0,0),c(39,4,2)))

apply(m, 1, function(x) all(x > 3))
# [1]  TRUE  TRUE FALSE FALSE

If you really want "PASS" and "FAIL" instead, you can factor the result of the apply step.

factor(apply(m, 1, function(x) all(x > 3)), 
       levels = c(FALSE, TRUE), 
       labels = c("FAIL", "PASS"))
# [1] PASS PASS FAIL FAIL
# Levels: FAIL PASS

Extending Codoremifa's answer a little, a similar approach works with data.table, especially since you specify that you want a vector or factor as the output.

library(data.table)
DT <- data.table(m)
DT[, all(.SD > 3), by = 1:nrow(DT)][, factor(V1, labels = c("FAIL", "PASS"))]
# [1] PASS PASS FAIL FAIL
# Levels: FAIL PASS
A5C1D2H2I1M1N2O1R2T1
  • 190,393
  • 28
  • 405
  • 485
  • I thought the data.table construct was really neat but I set up a 500000 rows `data.table` and it takes really long. Is it because of the 'by'? Any workaround? – TheComeOnMan Oct 23 '13 at 04:55
  • @Codoremifa, not really sure. How long is really long? :) – A5C1D2H2I1M1N2O1R2T1 Oct 23 '13 at 05:00
  • `> dt <- data.table( + V1 = round(rnorm(500000,0,1),1), + V2 = round(rnorm(500000,0,1),1), + V3 = round(rnorm(500000,0,1),1) + ) > > system.time(dt[V1 > .7 & V2 > .7 & V3 > .7, Indicator :="PASS" ]) user system elapsed 0.14 0.00 0.14 > system.time(dt[, all(.SD > .7), by = 1:nrow(dt)][, factor(V1, labels = c("FAIL", "PASS"))]) user system elapsed 75.61 0.05 76.25 ` – TheComeOnMan Oct 23 '13 at 05:11
  • @Codoremifa, I'm guessing it has to do with using `.SD` row-by-row. See [here](http://stackoverflow.com/a/16865506/1270695), but I haven't thought of a workaround yet. – A5C1D2H2I1M1N2O1R2T1 Oct 23 '13 at 05:37
4

Unlike other answers here, this uses rowSums but that's not looping in R and can outrun multiple subsets and logicals. It's probably the fastest route.

mat <- as.matrix(rbind(c(129,129,120),c(135,97,96),c(0,0,0),c(39,4,2)))

vec <- ifelse(rowSums(mat > 3) == 3, TRUE, FALSE)

We could also bypass ifelse and make it even faster.

vec <- rowSums(mat > 3) == 3

If you test these for time that will probably be the winner. On my system, using 30,000 row matrices, my first answer comes out about twice as fast as the gung answer and the second one comes out 10x as fast and can execute on 1000 30,000 row matrices in about 2 seconds. The Codoremifa answer is the fastest data.table based answer here and it takes 20s (similar to the gung answer).

NOTE: I kind of ignored your request for a "PASS", "FAIL" vector since you seemed to indicate speed was of paramount importance and it's a trivial semantic distinction. Furthermore, the logical vector is already prepared to subset the matrices if necessary.

John
  • 23,360
  • 7
  • 57
  • 83
  • This is exactly what I was looking for and you were right, the PASS/FAIL was just an idea. Didn't know about logical vectors, very useful! Thank you - I used the 2 line solution-blazing! – reviewer3 Oct 22 '13 at 22:26
  • 1
    @Stefan, I believe you can just do this with one line: `rowSums(m > 3) == ncol(m)`. – A5C1D2H2I1M1N2O1R2T1 Oct 23 '13 at 02:00
  • right…updated the answer because initially I had written things with PASS/FAIL and hadn't fully converted but...why use ncol? – John Oct 23 '13 at 02:26
  • 1
    @John, just as a way of automating things (so we don't have to first count how many columns there are in the original matrix). – A5C1D2H2I1M1N2O1R2T1 Oct 23 '13 at 04:33
2
library(data.table)
dt <- as.matrix(rbind(c(129,129,120),c(135,97,96),c(0,0,0),c(39,4,2)))

dt <- data.table(dt)
dt[, Indicator :="FAIL"]
dt[V1 > 3 & V2 >3 & V3 >3, Indicator :="PASS" ]
TheComeOnMan
  • 12,535
  • 8
  • 39
  • 54
2

Also, mapply:

mat <- as.matrix(rbind(c(129,129,120),c(135,97,96),c(0,0,0),c(39,4,2)))

fun <- function(x, y, z) { ifelse(x > 3 & y > 3 & z > 3, "PASS", "FAIL") } 
mapply(fun, mat[,1], mat[,2], mat[,3])
#[1] "PASS" "PASS" "FAIL" "FAIL"
alexis_laz
  • 12,884
  • 4
  • 27
  • 37
1

For problems like this, my first inclination is to combine ?all, ?apply, & ?ifelse, perhaps like the solution @Ananda provides. As he mentions, apply() is using a loop. If you want a completely vectorized solution, you could try:

newVector <- ifelse((xMatrix[,1]>3 & xMatrix[,2]>3 & xMatrix[,3]>3), 
                    "PASS", "FAIL")

Vectorization is a handy feature of R, and it is much faster than loops. You can read about vectorization here.

gung - Reinstate Monica
  • 11,583
  • 7
  • 60
  • 79