Finding a pattern in a binary matrix with R

Question

I have a nxn symetrical binary matrix and I want to find the largest rectangle (area) with 0 at the top-left and bottom-right corners and 1 at the top-right and bottom-left corner. If I just do it with loops, checking all the rectangles from the biggest to the smallest it takes "days" for n=100. Does anyone have an idea to do it efficiently?

Thanks a lot !

Hi Bertrand. Could you give us a concrete example? Would a method that works on a 100 x 100 non-symmetrical binary matrix made of randomly sampled 0s and 1s suffice? — Allan Cameron, Feb 23 '22 at 16:35

score 0 · Answer 1 · answered Feb 23 '22 at 16:56

0

thanks for your answers. Matrices I use are adjacency matrices of random Erdos-Renyi graphs. But one can take any random symetrical binary matrix to test it. Until now, I use 4 nested loops :

switch<-function(Mat)
{
n=nrow(Mat) 
for (i in 1:(n-1)) { 
    for(j in seq(n,i+1,by=-1)) {
        for(k in 1:(n-1)) { 
            if ((k==i)||(k==j) || (Mat[i,k]==1)||(Mat[j,k]==0)) next 
            for(l in seq(n,k+1,by=-1)) { 
                if ((l==i)||(l==j)|| (Mat[i,l]==0)||(Mat[j,l]==1)) next 
                return(i,j,k,l)
            }
        }
    }
}

answered Feb 23 '22 at 16:56

Bertrand Jouve

1

The problem is that when you begin with a random matrix you can find the largest rectangle very quickly but if you replace the 0 and 1's you have found by 1 and 0's and iterate the function 'switch' there is no more such rectangles near the borders of the matrix and you have to go deeper within the loops to find one ... and it takes a lot of time ... – Bertrand Jouve Feb 23 '22 at 17:07

Mikael Jagan · Answer 2 · 2022-02-24T05:43:23.323

Here's an approach that you can try for now. It doesn't require symmetry, and it treats all nonzero elements like ones for efficiency.

It loops over the ones, assuming that there are fewer ones than zeros. (You would want to loop over zeros in the reverse case with fewer zeros than ones.)

This approach probably isn't optimal, since it loops over all of the ones even if the largest box is identified early. You can devise a clever stopping condition to short-circuit the loop in that case. But it is still fast for n = 100, requiring less than half of a second on my machine, even when ones and zeros occur in roughly equal proportion (the worst case):

f <- function(X) {
    if (!is.logical(X)) {
        storage.mode(X) <- "logical"
    }
    J <- which(X, arr.ind = TRUE, useNames = FALSE)
    i <- J[, 1L]
    j <- J[, 2L]
    nmax <- 0L
    res <- NULL
    for (k in seq_along(i)) {
        i0 <- i[k]
        j0 <- j[k]
        ok <- i < i0 & j > j0
        if (any(ok)) {
            i1 <- i[ok]
            j1 <- j[ok]
            ok <- !(X[i0, j1] | X[i1, j0])
            if (any(ok)) {
                i1 <- i1[ok]
                j1 <- j1[ok]
                n <- (i0 - i1 + 1L) * (j1 - j0 + 1L)
                w <- which.max(n)
                if (n[w] > nmax) {
                    nmax <- n[w]
                    res <- c(i0 = i0, j0 = j0, i1 = i1[w], j1 = j1[w])
                }
            }
        }
    }
    res
}

mkX <- function(n) {
    X <- matrix(sample(0:1, n * n, TRUE), n, n)
    X[upper.tri(X)] <- t(X)[upper.tri(X)]
    X
}

set.seed(1L)
X <- mkX(6L)
X
##      [,1] [,2] [,3] [,4] [,5] [,6]
## [1,]    0    1    0    0    1    0
## [2,]    1    0    1    1    0    0
## [3,]    0    1    0    1    1    1
## [4,]    0    1    1    0    0    0
## [5,]    1    0    1    0    0    1
## [6,]    0    0    1    0    1    0

f(X)
## i0 j0 i1 j1 
##  5  1  1  5

Y <- mkX(100L)
microbenchmark::microbenchmark(f(Y))
## Unit: milliseconds
##  expr     min       lq     mean   median       uq      max neval
##  f(Y) 310.139 318.3363 327.8116 321.4109 326.5088 391.9081   100

Finding a pattern in a binary matrix with R

2 Answers2