How to avoid a double for loop when accessing all combinations of colums of a matrix?

Question

Ive been trying to find a solution for this around the Internet, but didn't manage to solve my specific problem. I have a Matrix D telling me the maximum and minimum x and y coordinates of a group of rectangles (columns). Now I want to find out, how often some of them intersect. The method I thought of to evaluate if two of them intersect needs to be run over all possible combinations of columns (also "backwards"). Now the first intuitive solution for me was to nest a for-loop going trough the columns into another one doing the same thing. Here is an example:

n <- 5

D <- matrix(c(1,2,1,2,1,3,1,3,2,4,2,4,3,5,1,2,3,4,2,4),nrow=4,ncol=n)

E <- mat.or.vec(nr=n, nc=n)
for (i in 1:n){
    for (j in 1:n){
        if (i != j &&
           (D[1,i] <= D[1,j] & D[1,j] < D[2,i]) &&
           ((D[3,i] <= D[3,j] & D[3,j] < D[4,i]) | (D[3,i] < D[4,j] & D[4,j] <= D[4,i]))) { 
            E[i,j] <- 1
        }
    }
}

     [,1] [,2] [,3] [,4] [,5]
[1,]    0    1    0    0    0
[2,]    1    0    1    0    0
[3,]    0    0    0    0    1
[4,]    0    0    0    0    0
[5,]    0    0    0    0    0

I have been trying various alternative solutions, including using foreach, outer, different versions of apply and combinations of them. I have found lots of versions that give me the same correct result, however for some reason the double for loop performed A LOT faster than any of the other methods.
The problem is, that there are about 4000 columns in the matrix and I have to run the whole resulting function about 50 times. So the time the loop takes is still too long. Can anyone tell me why the other solutions perform even worse (because I thought loops were the worst thing to do) and maybe suggest something that will work fast with a lot of data?

Edit: I tried out the two solutions that were suggested with 918 rectangles and benchmarked them.

D1 <- t(D)
D1 <- D1[,c(1,3,2,3,2,4,1,4)]
ID <- paste0('rect',seq_len(nrow(D1)))
polys <- SpatialPolygons(mapply(function(poly, id) {
  xy <- matrix(poly, ncol=2, byrow=TRUE)
  Polygons(list(Polygon(xy)), ID=id)
  }, split(D1, row(D1)), ID))
polys.df <- SpatialPolygonsDataFrame(polys,data.frame(id=ID,row.names=ID))

res <-  gOverlaps(polys.df, byid = TRUE) | 
        gContains(polys.df, byid = TRUE) |
        gWithin(polys.df, byid = TRUE)
diag(res) <-0

intersecting_rects <- function(edge1, edge2){
  #* index 1 is left edge, index 2 is right edge
  #* index 3 is lower edge, index 4 is upper edge
  (edge1[1] < edge2[2] && edge1[2] > edge2[1] &&
     edge1[3] < edge2[4] && edge1[4] > edge2[3])
}
microbenchmark(
  poly = {
    res <-  gOverlaps(polys.df, byid = TRUE) | 
      gContains(polys.df, byid = TRUE) | 
      gWithin(polys.df, byid = TRUE) | 
      gCovers(polys.df, byid = TRUE)
    diag(res) <- 0
  },
  full_loop = {
    E <- mat.or.vec(nr=n, nc=n)
    for (i in 1:n){
      for (j in 1:n){
        E[i, j] <- as.numeric(intersecting_rects(D[, i], D[, j]))
      }
    }
  },
  partial_loop = {
    E <- mat.or.vec(nr=n, nc=n)
    for (i in 1:n){
      for (j in (i:n)[-1]){
        E[i, j] <- as.numeric(intersecting_rects(D[, i], D[, j]))
      }
    }
    E[lower.tri(E)] <- t(E)[lower.tri(E)]
  }
)
Unit: seconds
         expr      min       lq      mean    median        uq       max neval
         poly 5.280785 5.985167  6.277424  6.288105  6.567685  7.141447   100
    full_loop 8.328631 9.700908 10.091188 10.112840 10.490162 11.450817   100
 partial_loop 4.335070 4.921649  5.203672  5.188610  5.503550  6.000614   100

We see that the partial loop is already a significant improvement over the full one, but only slightly better than the solution with spatial objects. Therefore I settled with Rolands solution. Thanks again!

Please supply a minimal reproducible example. http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example — JohannesNE, Oct 23 '15 at 08:51
If you have a relatively small number of rectangles I would consider assigning them each a bitfield and then "paint" (OR) the matrix with them. — mihaipopescu, Oct 23 '15 at 08:55
Well I have about 4000 of them, so is this a relatively small number? — Sarah Christin, Oct 23 '15 at 09:03
Correct me if I'm wrong, but `E` should be a symmetric matrix, shouldn't it? That is, since `E[2, 3] == 1`, shouldn't `E[3,2] == 1`? In otherwords, when rectangle 2 intersects rectangle 3, doesn't rectangle 3 also intersect rectangle 2? The reason I ask is that, if this is true, you do not need to loop over the full dimensions of the matrix. You only need to do the upper or triangular matrix, which can cut your loops in half. — Benjamin, Oct 23 '15 at 09:58
Also, the `for` loop gets kind of a bad reputation. It really isn't all that bad a tool when used well. `for` loops tend to perform worst when they have to reevaluate what kind of objects they are creating, or when they are used to make ever-expanding objects. The fact that you've defined the dimensions of `E` upfront and are not changing it's size or type should minimize the problems associated with `for` loops. I think you'd have to be very creative to get something to run much faster. — Benjamin, Oct 23 '15 at 10:00
`foreach` is a loop, `apply` is a loop, `outer` needs to allocate huge objects. Anything else is impossible to say without the specific code. — Roland, Oct 23 '15 at 10:42
About the symmetric matrix, it is actually supposed to never be symmetric in order to not count anything double (which apparently doesn't work properly). I did this to take into account rectangles containing others,too, which are only detected in one of the directions. Also, thanks for the help on loops! — Sarah Christin, Oct 23 '15 at 11:20

Roland · Answer 1 · 2015-10-23T10:39:49.823

This is not my area of expertise, but you should use spatial packages for this.

library(sp)
library(rgeos)
n <- 5

D <- matrix(c(1,2,1,2,1,3,1,3,2,4,2,4,3,5,1,2,3,4,2,4),nrow=4,ncol=n)

D1 <- t(D)
D1 <- D1[, c(1, 3, 2, 3, 2, 4, 1, 4)]
ID <- paste0('rect', seq_len(nrow(D1)))

# Create SP
#http://stackoverflow.com/a/26620550/1412059
polys <- SpatialPolygons(mapply(function(poly, id) {
  xy <- matrix(poly, ncol=2, byrow=TRUE)
  Polygons(list(Polygon(xy)), ID=id)
}, split(D1, row(D1)), ID))

# Create SPDF
polys.df <- SpatialPolygonsDataFrame(polys, data.frame(id=ID, row.names=ID))

plot(polys.df, col=rainbow(50, alpha=0.5))

You can then test if they overlap:

gOverlaps(polys.df, byid = TRUE)
#      rect1 rect2 rect3 rect4 rect5
#rect1 FALSE FALSE FALSE FALSE FALSE
#rect2 FALSE FALSE  TRUE FALSE FALSE
#rect3 FALSE  TRUE FALSE FALSE FALSE
#rect4 FALSE FALSE FALSE FALSE FALSE
#rect5 FALSE FALSE FALSE FALSE FALSE

E.g., rect2 and rect3 overlap, which you haven't identified:

plot(polys.df[2:3,], col=rainbow(50, alpha=0.5))

Or do they contain each other?

gContains(polys.df, byid = TRUE)

such as rect2 contains rect1:

plot(polys.df[1:2,], col=rainbow(50, alpha=0.5))

and so on ...

In the end you can do something like this:

res <-  gOverlaps(polys.df, byid = TRUE) | 
        gContains(polys.df, byid = TRUE) | 
        gWithin(polys.df, byid = TRUE) | 
        gCovers(polys.df, byid = TRUE)
diag(res) <- 0

#      rect1 rect2 rect3 rect4 rect5
#rect1     0     1     0     0     0
#rect2     1     0     1     0     0
#rect3     0     1     0     0     1
#rect4     0     0     0     0     0
#rect5     0     0     1     0     0

Of course, it would be sufficient to test only two of the conditions and then mirror the matrix at the diagonal (or even stay with a triagonal sparse matrix to save memory).

score 1 · Answer 2 · edited May 23 '17 at 12:29

There are a number of reasons I like Roland's answer better than the one I'm about to give here.

The code is more indicative of what is being done. (Important for when you come back to this six months from now)
It's more generalizable, and will work on any shape that the spatial objects can handle
It has good facilities for illustrating the problem.

Despite all that, I'm going to offer an alternative solution that might run faster than the spatial approach. This works for rectangles only, however.

First, we'll rely on a logical test for overlapping rectangles based on this question's answer. Then we can write a function to perform this test as

intersecting_rects <- function(edge1, edge2){
#* index 1 is left edge, index 2 is right edge
#* index 3 is lower edge, index 4 is upper edge
  (edge1[1] < edge2[2] && edge1[2] > edge2[1] &&
    edge1[3] < edge2[4] && edge1[4] > edge2[3])
}

We could have included this directly in the for loop, but I find it a little easier to read this way.

Next, let's calculate the matrix E. In this case, however, we're only going to run the loop over j where j is greater than i

n <- 5
D <- matrix(c(1,2,1,2,1,3,1,3,2,4,2,4,3,5,1,2,3,4,2,4),nrow=4,ncol=n)

E<-mat.or.vec(nr=n,nc=n)
for (i in 1:n){
  for (j in (i:n)[-1]){
    E[i, j] <- as.numeric(intersecting_rects(D[, i], D[, j]))
  }
}

E[lower.tri(E)] <- t(E)[lower.tri(E)]
E

Not too bad! But how does it compare to Roland's answer? Let's set it all up and run a crude comparison (MAJOR CAVEAT: this comparison is on a ridiculously small set of rectangles and I wouldn't assume this will scale nicely, but if you can run it on 100 of your rectangles, it should give you a good idea of what comes out fastest)

library(sp)
library(rgeos)
library(microbenchmark)

#* Rectangles Comparison Function
intersecting_rects <- function(edge1, edge2){
  (edge1[1] < edge2[2] && edge1[2] > edge2[1] &&
    edge1[3] < edge2[4] && edge1[4] > edge2[3])
}

n <- 5
D <- matrix(c(1,2,1,2,1,3,1,3,2,4,2,4,3,5,1,2,3,4,2,4),nrow=4,ncol=n)


#* Set up the Spatial Obects
D1 <- t(D)
D1 <- D1[, c(1, 3, 2, 3, 2, 4, 1, 4)]
ID <- paste0('rect', seq_len(nrow(D1)))

    # Create SP
    #https://stackoverflow.com/a/26620550/1412059
polys <- SpatialPolygons(mapply(function(poly, id) {
  xy <- matrix(poly, ncol=2, byrow=TRUE)
  Polygons(list(Polygon(xy)), ID=id)
}, split(D1, row(D1)), ID))

    # Create SPDF
polys.df <- SpatialPolygonsDataFrame(polys, data.frame(id=ID, row.names=ID))

#* Benchmark the speeds
microbenchmark(
  poly = {
    res <-  gOverlaps(polys.df, byid = TRUE) | 
            gContains(polys.df, byid = TRUE) | 
            gWithin(polys.df, byid = TRUE) | 
            gCovers(polys.df, byid = TRUE)
    diag(res) <- 0
  },
  full_loop = {
    E <- mat.or.vec(nr=n, nc=n)
    for (i in 1:n){
      for (j in 1:n){
        E[i, j] <- as.numeric(intersecting_rects(D[, i], D[, j]))
      }
    }
  },
  partial_loop = {
    E <- mat.or.vec(nr=n, nc=n)
    for (i in 1:n){
      for (j in (i:n)[-1]){
        E[i, j] <- as.numeric(intersecting_rects(D[, i], D[, j]))
      }
    }
    E[lower.tri(E)] <- t(E)[lower.tri(E)]
  }
)

Unit: microseconds
         expr      min        lq       mean    median        uq      max neval cld
         poly 2656.800 2720.8745 2812.72787 2767.2080 2811.4875 4200.736   100   b
    full_loop  108.795  116.5650  122.77029  120.8170  127.2690  175.947   100  a 
 partial_loop   69.500   76.3905   87.01193   85.7745   94.1325  166.857   100  a

So if you're really interested in speed, the for loop might not be a terrible option.

I'd guess that my answer scales much better for large numbers of polygons. Maybe OP can benchmark with her actual dataset. (There might also be faster solutions for creating the spatial polygons that I didn't find in my cursory search.) — Roland, Oct 23 '15 at 12:35
@Roland, I'm actually somewhat skeptical of that, but only because the logic in `gOverlaps`, etc is generalized to n-sided polygons. But I guess this calls for a larger experiment to find out. — Benjamin, Oct 23 '15 at 12:41
As noted in this gist (https://gist.github.com/nutterb/727755cf2a0568760556), I need a better way to spend my Friday mornings. Why do I find these puzzles so interesting? Anyway, the Gist creates a random collection of n rectangles that can be compared. So long as my method sets up the rectangles correctly, I think it's a valid comparison. — Benjamin, Oct 23 '15 at 13:02
Interesting. However, we can probably improve the timing of my solution if we do not all 4 tests. The first two should be sufficient if OP needs only the result in one direction e.g., a triangualar matrix. — Roland, Oct 23 '15 at 15:01
Hey thanks to both of you, next week when I get back to work I will try out both solutions and post the results here. But I can already say that I love about Rolands answer that it gives me a nice plot to check if my calculations make sense. — Sarah Christin, Oct 23 '15 at 15:42

How to avoid a double for loop when accessing all combinations of colums of a matrix?

2 Answers2