10

Given a set of distinct points in 2D space, and a rectangle (coordinates of all four points, sides parallel with xy axis) how can I quickly find which points are inside the rectangle?

I'm not interested in the basic solution of going through all points and seeing which one is inside the rectangle. What I'm looking for is an algorithm which will give me fast query times per rectangle.

I don't care how much time I spend in the preprocessing step. What I do care is that after I process my data I obtain a useful structure which gives me fast query times per rectangle.

I know for example I can count how many points I have inside a rectangle in O(logN). That works because I do a lot of heavy processing in the beginning and then query the processed data with a new rectangle every time and get a new count in logN time. I'm looking for a similar algorithm for finding the actual points not just their count.

Alex
  • 367
  • 4
  • 15
  • Is the rectangle rotated? If not, its just a simple AABB check: `if(p.x >= rect.x && px. <= rect.x + rect.width){ ... }` – Draco18s no longer trusts SE Dec 14 '15 at 15:26
  • 1
    See this post: http://stackoverflow.com/questions/10269179/find-rectangles-that-contain-point-efficient-algorithm – Alex Dec 14 '15 at 15:26
  • 1
    I don't get how you are proposing it in LogN time. For N points you will at least have to go through all N points once. The best you can get is O(N). – displayName Dec 14 '15 at 15:33
  • 1
    I'm looking for fast query times. i.e. my points are given and fixed. what changes from query to query is the rectangle. I'm looking for fast query times for each rectangle (the preprocessing step may take whatever, that is done once, what I care about are query times). – Alex Dec 14 '15 at 15:38
  • Do the rectangles happen to have the same center? – CompuChip Dec 14 '15 at 15:55
  • The rectangles can be of any width/height and in any location – Alex Dec 14 '15 at 15:57
  • What would be the max width and height (in points) of that 2D space? – Déjà vu Dec 14 '15 at 16:20
  • 1
    The complexity will be at least O(Log N+K) where K is the number of points reported; the second term can exceed the first. –  Dec 14 '15 at 16:23

5 Answers5

11

A classical answer is the kD-tree (2D-tree in this case).

For a simple alternative, if your points are spread uniformly enough, you can try by gridding.

Choose a cell size for a square grid (if the problem is anisotropic, use a rectangular grid). Assign every point to the grid cell that contains it, stored in a linked list. When you perform a query, find all cells that are overlapped by the rectangle and scan them to traverse their lists. For the partially covered cells, you will need to perform the point-in-rectangle test.

The choice of the size is important: too large can result in too many points needing to be tested anyway; too small can result in too many empty cells.

3

You are looking for kd-tree range search or range query.

  • Quadtrees (or octtrees or 16-trees...) are better when there is a changing set of points, but you expect the distribution to be uniform. No further balancing steps are needed, because the structure of the tree is fixed.
  • kd-trees perform better on a fixed set of point, even with non-uniform distribution. When the point set changes, it is hard (but not impossible) to do self-balancing steps.
  • AABB trees (or fat AABB trees) provide a fast way for testing overlapping shapes, not only points. AABB trees need some balancing occasionally. When constantly moving objects are included, common way is to use "fat AABB-s", so you don't have to update the tree in every frame.
  • Sorting by only one axis and using binary search (something like abelenky suggested, but I think there is no point in doing a second binary search) is a simple and elegant solution, but will be slow when for example you sort by the X axis, and all the points are on a line parallel with Y. You have to do a linear filtering on the points which are matched by the binary searches by X. Time complexity is worst case O(n), but this worst case happens pretty often.

All these algorithms run queries in average O(log n + k) where k is the count of matched points.

Gridding, like Yves suggested, can perform range search in O(k) time, but only when the size of the query rectangle is bounded. This is what they often do in particle simulations. Gridding can be used even when the input set is not bounded -- just make a fixed count of buckets based on hash of the grid coordinates. But if the query rectangle can be of arbitrary size, then gridding is a no-go.

Tamas Hegedus
  • 28,755
  • 12
  • 63
  • 97
  • Do you have a quick link that explains your second point? I'd intuitively expect the opposite. – Jens Schauder Dec 15 '15 at 06:28
  • @JensSchauder no, I don't have, it's just intuition, backed by the fact sometimes you need to rebalance the tree. In fact, it is possible that rebalancing the tree doesn't take so much time. I'll do some research and measure the approaches sometime. – Tamas Hegedus Dec 15 '15 at 13:01
  • Oh, I might have misunderstood your point. I understood it as kd-trees perform on a fixed set of points better than Quadtrees. But now I think you mean better than on changing point sets? – Jens Schauder Dec 15 '15 at 13:11
  • @JensSchauder sry for not being clear. I did mean that kd-trees are better when you have a fixed set of points. Imagine a clustered point set in a big search space. kd-trees space partitioning adapts to the cluster, while quadtree needs to split the whole search space in four until the cluster is reached. That means the quadtree will be deeper. – Tamas Hegedus Dec 15 '15 at 13:17
  • @JensSchauder But based on http://cstheory.stackexchange.com/questions/8470/why-would-one-ever-use-an-octree-over-a-kd-tree depth is only one point of view. aspect ratio of the subspaces can be worse with kd-trees. – Tamas Hegedus Dec 15 '15 at 13:22
  • The link is broken – codename44 Feb 27 '18 at 08:00
1

You could group point in sectors. If a sector is completely in or out of given rectangle then all point within it are also in or out. If a sector is partially in then you have to search O(n) for points in that sector to check if they are in the rectangle. Look for k-d tree search.

Dialecticus
  • 16,400
  • 7
  • 43
  • 103
1

Along with other answers, you can also look into Morton codes (z-order curve sorting).

In your case, that is static data, you can even represent the whole point data as an array.

https://en.wikipedia.org/wiki/Z-order_curve

This paper also have a rather complicated timeline of different "multi-dimentional access methods" --http://www.cc.gatech.edu/computing/Database/readinggroup/articles/p170-gaede.pdf

hkrish
  • 1,259
  • 1
  • 10
  • 11
0

I think you should store your points in a quadtree. I have not worked out the details, but it should allow to basically do something similar to a binary search that directly yields the points that are inside an rectangle.

If your points are clustered, i.e. there exist clusters that contain many points in a small area and other areas that contain no, or very few points an R-tree might be even better.

Runtime complexity should be O(logN) I think.

Jens Schauder
  • 77,657
  • 34
  • 181
  • 348