Ive been trying to find a solution for this around the Internet, but didn't manage to solve my specific problem. I have a Matrix D telling me the maximum and minimum x and y coordinates of a group of rectangles (columns). Now I want to find out, how often some of them intersect. The method I thought of to evaluate if two of them intersect needs to be run over all possible combinations of columns (also "backwards"). Now the first intuitive solution for me was to nest a for-loop going trough the columns into another one doing the same thing. Here is an example:
n <- 5
D <- matrix(c(1,2,1,2,1,3,1,3,2,4,2,4,3,5,1,2,3,4,2,4),nrow=4,ncol=n)
E <- mat.or.vec(nr=n, nc=n)
for (i in 1:n){
for (j in 1:n){
if (i != j &&
(D[1,i] <= D[1,j] & D[1,j] < D[2,i]) &&
((D[3,i] <= D[3,j] & D[3,j] < D[4,i]) | (D[3,i] < D[4,j] & D[4,j] <= D[4,i]))) {
E[i,j] <- 1
}
}
}
[,1] [,2] [,3] [,4] [,5]
[1,] 0 1 0 0 0
[2,] 1 0 1 0 0
[3,] 0 0 0 0 1
[4,] 0 0 0 0 0
[5,] 0 0 0 0 0
I have been trying various alternative solutions, including using foreach
, outer
, different versions of apply
and combinations of them. I have found lots of versions that give me the same correct result, however for some reason the double for loop performed A LOT faster than any of the other methods.
The problem is, that there are about 4000 columns in the matrix and I have to run the whole resulting function about 50 times. So the time the loop takes is still too long. Can anyone tell me why the other solutions perform even worse (because I thought loops were the worst thing to do) and maybe suggest something that will work fast with a lot of data?
Edit: I tried out the two solutions that were suggested with 918 rectangles and benchmarked them.
D1 <- t(D)
D1 <- D1[,c(1,3,2,3,2,4,1,4)]
ID <- paste0('rect',seq_len(nrow(D1)))
polys <- SpatialPolygons(mapply(function(poly, id) {
xy <- matrix(poly, ncol=2, byrow=TRUE)
Polygons(list(Polygon(xy)), ID=id)
}, split(D1, row(D1)), ID))
polys.df <- SpatialPolygonsDataFrame(polys,data.frame(id=ID,row.names=ID))
res <- gOverlaps(polys.df, byid = TRUE) |
gContains(polys.df, byid = TRUE) |
gWithin(polys.df, byid = TRUE)
diag(res) <-0
intersecting_rects <- function(edge1, edge2){
#* index 1 is left edge, index 2 is right edge
#* index 3 is lower edge, index 4 is upper edge
(edge1[1] < edge2[2] && edge1[2] > edge2[1] &&
edge1[3] < edge2[4] && edge1[4] > edge2[3])
}
microbenchmark(
poly = {
res <- gOverlaps(polys.df, byid = TRUE) |
gContains(polys.df, byid = TRUE) |
gWithin(polys.df, byid = TRUE) |
gCovers(polys.df, byid = TRUE)
diag(res) <- 0
},
full_loop = {
E <- mat.or.vec(nr=n, nc=n)
for (i in 1:n){
for (j in 1:n){
E[i, j] <- as.numeric(intersecting_rects(D[, i], D[, j]))
}
}
},
partial_loop = {
E <- mat.or.vec(nr=n, nc=n)
for (i in 1:n){
for (j in (i:n)[-1]){
E[i, j] <- as.numeric(intersecting_rects(D[, i], D[, j]))
}
}
E[lower.tri(E)] <- t(E)[lower.tri(E)]
}
)
Unit: seconds
expr min lq mean median uq max neval
poly 5.280785 5.985167 6.277424 6.288105 6.567685 7.141447 100
full_loop 8.328631 9.700908 10.091188 10.112840 10.490162 11.450817 100
partial_loop 4.335070 4.921649 5.203672 5.188610 5.503550 6.000614 100
We see that the partial loop is already a significant improvement over the full one, but only slightly better than the solution with spatial objects. Therefore I settled with Rolands solution. Thanks again!