4

I have a list of coordinates that form axis-aligned 2D boxes (axis-oriented/iso-oriented rectangles, rects).
I want to see how many boxes intersect each other without repeating it. For example, I have boxes A, B, and C. If A intersects with B, B intersecting with A will still count as one intersection instead of two separate ones.

The way I'm thinking is to grab one box, compare it to all the rest and then throw it out since the comparison is done and I want no repeats. For example going back to A, B and C. If I'm on A and it intersects B and not C I have made its round and hence will not need to keep it in the loop. Hence once I'm on B it will not recheck A.

I can't think of a faster method. This is akin to finding duplicates using linear search and I think the worst-case complexity will be O(n^2). I don't see how to use binary search as it is possible that there are multiple intersections. I've been also looking at the sorting algorithm but I need one that won't go back and check the older values again.

greybeard
  • 2,249
  • 8
  • 30
  • 66
wasi hassan
  • 63
  • 10

3 Answers3

2

You can solve this in O(n log n). Since you're trying to solve a static intersection problem in two dimensions, a common trick is to transform the problem into a dynamic intersection problem in one dimension (i.e., one space dimension becomes the time dimension).

Suppose you have a closed rectangle with lower left corner (x1, y1) and upper right corner (x2, y2). This rectangle becomes two "interval events":

Interval [y1, y2] inserted at time x1
Interval [y1, y2] removed after time x2

Transform all your rectangles into events in this way, and then sort by time (breaking ties with insertions coming before removals).

Now, you need a data structure that lets you add and remove intervals [A, B], and also query the number of intervals in the data structure intersecting [A, B]. Then you process the "interval events" in the sorted order, but keep a running sum of how many current intervals intersect [A, B] before inserting each interval [A, B].

One data structure to do this in O(log n) time per operation is with two balanced binary search trees: one holding beginning-points of intervals, and the other holding ending-points of intervals. You'll also need to augment each BST to be an Order Statistic Tree, to quickly query the number of points less than or equal to a certain value that are in the tree.

Then, finding how many intervals currently in your data structure intersect an arbitrary interval [A, B] is simple; that count is:

#(Intervals intersecting [A, B]) =

  #(values in the beginning-points tree that are <= B) 
- #(values in the ending-points tree that are < A)

which you can check is correct from the definition of two intervals intersecting: neither interval starts after the other one has ended.

You could also replace the order statistic tree with a data structure for prefix-sums, like a Fenwick tree, which requires much less code for the same complexity.

kcsquared
  • 5,244
  • 1
  • 11
  • 36
  • This is really smart but wont I get doubles though? – wasi hassan Apr 04 '22 at 07:13
  • @wasihassan What do you mean by doubles? – kcsquared Apr 04 '22 at 07:21
  • like it won't count intersections twice e.g I have boxes A, B, and C. If A intersects with B, B intersecting with A will still count as 1 intersection instead of 2 separate ones. @kcsquared – wasi hassan Apr 04 '22 at 08:08
  • @wasihassan Thanks for the question; no, it will only count intersections once. The most complex part of the solution is writing an Order-statistic tree, if you don’t have access to one in a library – kcsquared Apr 04 '22 at 08:42
  • 1
    Sorry to bother you again and this is probably a dumb question but how would I compare points like how would I compare values in the beginning-points tree that are <= B do I have to compare x and y of one set to the x and y of the other and only if both values of one set are greater then one is point greater then the other? @kcsquared – wasi hassan Apr 04 '22 at 11:38
  • @wasihassan No worries, ask anything. It's probably easiest to show an actual implementation or detailed pseudocode-- I only have Order-statistic tree code in Python, so I'll have to post some pseudocode for the algorithm – kcsquared Apr 04 '22 at 20:56
1

Sample implementation of kcsquared's algorithm. You didn't specify a language, so I chose Python since I'm familiar with it and it's short readable code. Rectangles have lower left coordinate (x,y) and upper right coordinate (X,Y).

def intersections(rects):
    events = []
    for A in rects:
        events.append((A.x, 'insert', A.y, A.Y))
        events.append((A.X, 'remove', A.y, A.Y))
    intersections = 0
    ys = SortedList()
    Ys = SortedList()
    for x, op, y, Y in sorted(events):
        if op == 'insert':
            intersections += ys.bisect_right(Y) - Ys.bisect_left(y)
            ys.add(y)
            Ys.add(Y)
        else:
            ys.remove(y)
            Ys.remove(Y)
    return intersections

Tests on lists of 5,000 random rectangles against an O(n2) reference implementation, showing the numbers of intersections and runtimes:

124257  5465 ms  reference
124257   127 ms  intersections

121166  5444 ms  reference
121166   124 ms  intersections

118980  5435 ms  reference
118980   124 ms  intersections

With 10,000 rectangles:

489617  22342 ms  reference
489617    292 ms  intersections

489346  22491 ms  reference
489346    296 ms  intersections

489990  22302 ms  reference
489990    290 ms  intersections

Full code:

def reference(rects):
    intersections = 0
    for A, B in combinations(rects, 2):
        if A.X >= B.x and A.x <= B.X and A.Y >= B.y and A.y <= B.Y:
            intersections += 1
    return intersections

def intersections(rects):
    events = []
    for A in rects:
        events.append((A.x, 'insert', A.y, A.Y))
        events.append((A.X, 'remove', A.y, A.Y))
    intersections = 0
    ys = SortedList()
    Ys = SortedList()
    for x, op, y, Y in sorted(events):
        if op == 'insert':
            intersections += ys.bisect_right(Y) - Ys.bisect_left(y)
            ys.add(y)
            Ys.add(Y)
        else:
            ys.remove(y)
            Ys.remove(Y)
    return intersections

from random import randint, randrange
from itertools import combinations
from timeit import default_timer as timer
from sortedcontainers import SortedList
from collections import namedtuple

Rect = namedtuple('Rect', 'x X y Y')

for _ in range(3):
    rects = [Rect(x, x + randint(1, 100), y, y + randint(1, 100))
             for _ in range(5000)
             for x, y in [(randrange(1000), randrange(1000))]]
    for func in reference, intersections:
        t0 = timer()
        result = func(rects)
        t1 = timer()
        print(result, '%4d ms' % ((t1 - t0) * 1e3), func.__name__)
    print()
Kelly Bundy
  • 23,480
  • 7
  • 29
  • 65
0

You can somewhat reduce the average complexity, more precisely the computation time, but you'll always get a worst case in ... Because, inherently, that's a O(n²) algorithm, since you have to test two by two every N boxes.

One way to reduce that computation time is to "attach", to each box, its circumscribed sphere. It's quite simple to compute: its center is the barycenter of the 8 vertexes of the box / 4 vertexes of the rectangle, and its radius is any distance from this barycenter to one vertex.

The trick is: given two boxes, if their circumscribed spheres aren't intersecting (distance between the two centers is greater than the sum of the two radiuses), then the boxes can not intersect. If the two spheres intersect, then the boxes may intersect.

This trick doesn't change at all the complexity, but it reduce A LOT the computation time, since it's way faster to test for spheres than for boxes (especially if they aren't parallel to axes...). And it will reduce the list size, up to empty it for any potential intersections. Also, you also have only a shortened (even empty) list of potential intersections for each box.

You can gain some computation time by using a list of boxes pairs to finely test, using squares of distances for comparisons, etc.: it's not so much, but in the end, it still saves some time. Then, you can compute the real boxes intersections with way less boxes pairs.

This would be a significative boost with a lot of 3D random boxes unaligned with axes, and it won't be with few 2D rectangles aligned with axes. In both cases, you'll always be with a O(n²) complexity (i.e. the worst case).

Wisblade
  • 1,483
  • 4
  • 13
  • The circle Idea is pretty solid but won't the gaps between the square and circle give false positives. Also sadly it can only be with squares. – wasi hassan Apr 03 '22 at 02:08
  • @wasihassan Yes, of course, you'll get false positives... Because it's a fast criteria to quickly remove _**impossible**_ intersections, so to speed up computation time by greatly reducing the number of tested boxes. It's quite the same kind of technique used for sprites collision in video games. And no, it works with every possible **convex** volume. Barycenter is always at the same distance of each edge (definition), and due to convexity, the circumscribed sphere is defined exactly the same way (definition, too). It works in 2D, 3D, 4D, ... and for all convex objects. – Wisblade Apr 03 '22 at 11:21
  • @wasihassan It will be still way faster than testing unconditionnally all boxes' intersections without any clue of if an intersection is possible (or not)... – Wisblade Apr 03 '22 at 11:23
  • @greybeard In 3D, there is 8 (question was unclear about 2D/3D at this time). For "rectangle", that's a typo, I meant "box". – Wisblade Apr 06 '22 at 10:29
  • @KellyBundy Oh f.... I mixed up vertex and edges again... Don't know why I always mix up the two translations in my head - as you guessed it, English is not my native language. – Wisblade Apr 06 '22 at 11:01
  • @Wisblade I sometimes mix them up as well, in my native German it's Edge=Ecke and Corner=Kante, so they look and sound somewhat similar in both languages. Or at least I think that's why I sometimes mix them up, since it's actually exactly the opposite, edge=Kante and corner=Ecke :-) – Kelly Bundy Apr 06 '22 at 11:12
  • (With the edit, I think I see what you are getting at. If I thought the question asked well, I wouldn't have edited it.) – greybeard Apr 06 '22 at 17:41
  • @greybeard No problem, was my bad also... As I said to OP, this algo is designed for **very** complex/time-consuming intersections, of volumes (or more...) in any size, shape, orientation, etc. in order to greatly reduce, with a `O(n²`)` but _very_ fast first pass that should avoid a lot of impossible intersections... Then, the final test will be a `O(m²)` algorithm, with `m<=n` (and often, `m` is _greatly_ lower than `n`). When correctly used, it allows to gain a magnitude order in complexity... Because computing intersection of two irregular dodecahedrons, for example, can be a real pain. – Wisblade Apr 06 '22 at 20:03