0

I have a set of problems where I have multiple coordinates and I want to quickly detect where that coordinate lies within a list of pre-made boxes.

I provide a code for this type of problem. But I am using "for loop" iteration, which I think is quite slow given my current situation dealing with big data. That being set I might have thousands of boxes (rectangles/squares) and coordinates.

I am looking for any other alternative that may be efficient and fast for this type of problem?

import numpy as np
import matplotlib.pyplot as plt
from matplotlib.patches import Rectangle

pad = 1
xn1 = np.linspace(0-(pad*2), 0+(pad*2), 3)
yn1 = np.linspace(0-(pad*2), 0+(pad*2), 3)
print(xn1, yn1)

xn1_list = []
yn1_list = []
xy1_list = []

# Create the coordinates
for p1 in range(0, len(xn1)):
    for q1 in range(0, len(yn1)):
        xn1_list.append(xn1[p1]) 
        yn1_list.append(yn1[q1])

for pad1, coord1 in enumerate(zip(xn1_list, yn1_list)):
    xy1_list.append(coord1)
print('\nxy1_list',xy1_list)

# Plot
fig, (ax1) = plt.subplots(figsize = (8,8))
for i in range(0, len(xy1_list)):
    # print(len(xy3_list))
    plt.scatter(xy1_list[i][0],xy1_list[i][1])
    ax1.add_patch(Rectangle(xy=(xy1_list[i][0]-(pad*2)/2, xy1_list[i][1]-(pad*2)/2) ,width=pad*2, height=pad*2,
                                            linewidth=1, color='blue', fill=False))
    ax1.annotate(i, xy=(xy1_list[i][0], xy1_list[i][1]), textcoords='data', fontsize = 15) 
    
plt.xticks(np.arange(-3, 3.1, step=1))
plt.yticks(np.arange(-3, 3.1, step=1))

# Current coordinate
X_cur, Y_cur = 0.5,2
plt.scatter(X_cur, Y_cur, color='r', marker='x')

# Iterate every possible box in the list 
for i in range(0,len(xy1_list)):
    Xmin_temp, Xmax_temp = xy1_list[i][0]-pad, xy1_list[i][0]+pad
    Ymin_temp, Ymax_temp = xy1_list[i][1]-pad, xy1_list[i][1]+pad
    
    # Check if the current coordinate is in one of the boxes
    if (Xmin_temp < X_cur <= Xmax_temp) & (Ymin_temp < Y_cur <= Ymax_temp):
        print('Current Coordinate is in Box'+str(i)+' with a centre coordinate of ('+str(xy1_list[i][0])+', '+str(xy1_list[i][1])+')')

Visualize the results: enter image description here

Edited version according to @Green Cloak Guy idea

from math import floor, ceil
subdivisions = {}              # hashtable of subdivisions of the graph
                                # key format: 2-tuple (x, y)
                                # value format: 4-tuple (left, right, bottom, top) 

...

# Plot
fig, (ax1) = plt.subplots(figsize = (8,8))
for i in range(0, len(xy1_list)):
    print('\nCenter Coord of the box'+str(i)+' - '+str(xy1_list[i]))
    # renaming your variables to be easier to work with
    # assuming [x, y] is center of each square
    width = height = pad * 2  # note that you could have differently-shaped rectangles
    left = xy1_list[i][0] - (width / 2)
    right = xy1_list[i][0] + (width / 2)
    bottom = xy1_list[i][1] - (height / 2)
    top = xy1_list[i][1] + (height / 2)
    print('x1:'+str(left)+' x2: '+str(right)+' y1: '+str(bottom)+' y2: '+str(top))

    # add rectangle to plot, as you do in your code
    plt.scatter(xy1_list[i][0], xy1_list[i][1])
    ax1.add_patch(Rectangle(xy=(xy1_list[i][0]-width/2, xy1_list[i][1]-height/2), width=width, height=height,
                  linewidth=1, color='blue', fill=False))
    ax1.annotate(i, xy=(xy1_list[i][0], xy1_list[i][1]), textcoords='data', fontsize = 15) 

    # # add rectangle to the appropriate subdivision(s)
    x, y = xy1_list[i][0], xy1_list[i][1]
    subdivisions.setdefault((x,y), [])
    subdivisions[(x,y)].append((left, right, bottom, top))

...

# Current coordinate
X_cur, Y_cur = 0.5,2
plt.scatter(X_cur, Y_cur, color='r', marker='x')

# find box(es) coordinate is in
# in this case, that would be the box going from (0, 0) to (2, 2), 
#   which, in the coordinate dictionary, would be (0, 0)
boxes_contained = [
    box
    for box in subdivisions.get((x,y),[])
    if (box[0] <= X_cur <= box[1]) and (box[2] <= Y_cur <= box[3])
]

But it does not return anything in the boxes_contained. How can I fix this based on the new code?

Mz Irn
  • 1
  • 3
  • Is it possible for 1 point to be in more than 1 box? – PeptideWitch Jan 27 '22 at 05:34
  • You can remove one of your for-loops by assigning the ```xn1_list``` and ```yn1_list``` coordinates to a tuple immediately, rather than creating two lists and then zipping them together (you would then have a single list of tuples). Not sure of the speed improvements, but you wouldn't be appending to three lists anymore. To create a scatter plot of these coordinates, you can take a look at https://stackoverflow.com/questions/17478779/make-scatter-plot-from-set-of-points-in-tuples – ChaddRobertson Jan 27 '22 at 05:34
  • 1
    Do you need to know if a set of coordinates is in *any* of the boxes, or which boxes it is in (finding all of them)? – Grismar Jan 27 '22 at 05:34
  • @PeptideWitch I have thousands of coordinates (points) and I want to locate them quickly. Those points may be in any of the possible boxes – Mz Irn Jan 27 '22 at 05:37
  • 2
    Group your boxes into another data structure - divide the grid into some-size segments, and have each segment maintain a reference to all boxes that intersect with it (you could use a `dict` for this, for constant-time lookups). Then, when you need to find whether a point is in any box, you can just figure out which segment the point is in, and check intersection with all the boxes in just that segment – Green Cloak Guy Jan 27 '22 at 05:38
  • @ChaddRobertson I need to know which boxes that coordinate is in. (Which coordinates belong to which boxes) – Mz Irn Jan 27 '22 at 05:42
  • @GreenCloakGuy Your idea seems to be promising but could you provide a sample code? if you don't mind – Mz Irn Jan 27 '22 at 05:45
  • 1
    @ChaddRobertson that xn1_list and yn1_list is just an example for me to create artificial coordinates. But I appreciate ur comment! – Mz Irn Jan 27 '22 at 05:48

1 Answers1

0

The solution I use for this kind of problem is to subdivide the entire grid into manageably small segments, and track in each segment which rectangles (or other objects) intersect with them. Then, when we have a point, we can simply find which segment of the grid the point is in, and we only have to test against a subset of the total number of rectangles. This is relatively easy to do by iterating once over the list of rectangles.

from math import floor, ceil
sub_size_x, sub_size_y = 2, 2  # record the size of our subdivisions, e.g. 2x2
subdivisions = {}              # hashtable of subdivisions of the graph
                               # key format: 2-tuple (x // sub_size_x, y // sub_size_y)
                               # value format: 4-tuple (left, bottom, width, height)

...

# Plot
fig, (ax1) = plt.subplots(figsize = (8,8))
for i in range(0, len(xy1_list)):
    # renaming your variables to be easier to work with
    # assuming [x, y] is center of each square
    width = height = pad * 2  # note that you could have differently-shaped rectangles
    left = xylist[i][0] - (width / 2)
    bottom = xylist[i][1] - (height / 2)
    right = xylist[i][0] + (width / 2)
    top = xylist[i][1] + (height / 2)

    # add rectangle to plot, as you do in your code
    plt.scatter(left + (width / 2), bottom + (height / 2))
    ax1.add_patch(Rectangle(xy=(left, top), width=width, height=height,
                  linewidth=1, color='blue', fill=False))
    ax1.annotate(i, xy=(xy1_list[i][0], xy1_list[i][1]), textcoords='data', fontsize = 15) 

    # add rectangle to the appropriate subdivision(s)
    for x in range(floor(left / sub_size_x), ceil(right / sub_size_x)):
        for y in range(floor(bottom / sub_size_y), ceil(top / sub_size_y)):
            subdivisions.setdefault((x, y), [])
            subdivisions[(x, y)].append((left, bottom, width, height))

...

# Current coordinate
X_cur, Y_cur = 0.5,2
plt.scatter(X_cur, Y_cur, color='r', marker='x')

# find box(es) coordinate is in
# in this case, that would be the box going from (0, 0) to (2, 2), 
#   which, in the coordinate dictionary, would be (0, 0)
boxes_contained = [
    box
    for box in subdivisions.get((X_cur // sub_size_x, Y_cur // sub_size_y), [])
    if (box[0] <= X_cur <= box[0] + box[2]) and (box[1] <= Y_cur <= box[1] + box[3])
]

(note: my current python installation does not have numpy or pyplot and I'm not in a position to be able to install them right now, so I have not tested this code. But you should get the picture)

Note that this approach works for an arbitrary-sized subdivision, and also for rectangles of different sizes. If you were to use an actual Rectangle or other Shape class instead of the 4-tuple value, then you could modify the code to call a method to check intersection.

Ideally your subdivisions should be sized such that you're not duplicating too much information, and such that not too many boxes overlap the same subdivision. What dimensions are ideal depends on the dataset of boxes you're working woth, and this is a tradeoff between space complexity and time complexity.

Green Cloak Guy
  • 23,793
  • 4
  • 33
  • 53
  • Thanks for your feedback. I think I'm on the right track. I edited the code based on your idea. But there's a minor problem where it does not return anything in the 'boxes_contained'. Can you spot my mistake based on the new code? – Mz Irn Jan 27 '22 at 08:13
  • @MMM ah, i mistyped `X_cur, Y_cur` as `x` and `y` in `(X_cur // sub_size_x, Y_cur // sub_size_y)`. Note that dividing by subdivision size is important because it normalizes the point onto the grid of subdivisions (otherwise, if we have subdivisions at keys (0, 1) and (1, 1), we're not going to find a subdivision at (0.5, 2) unless it's normalized) – Green Cloak Guy Jan 27 '22 at 13:59
  • Thanks. Your method is surprisingly working! Though I do not quite understand it fully on your `subdivision(s)` part. Instead of doing that, couldn't we just append the `xmin`, `xmax`, `ymin` and `ymax` of the centre coordinate of the boxes? Instead of a point (x,y,sub_size_x, sub_size_y) – Mz Irn Jan 27 '22 at 14:40