5

I have a set of axis parallel 2d rectangles defined by their top left and bottom right hand corners(all in integer coordinates). Given a point query, how can you efficiently determine if it is in one of the rectangles? I just need a yes/no answer and don't need to worry about which rectangle it is in.

I can check if (x,y) is in ((x1, y1), (x2, y2)) by seeing if x is between x1 and x2 and y is between y1 and y2. I can do this separately for each rectangle which runs in linear time in the number of rectangles. But as I have a lot of rectangles and I will do a lot of point queries I would like something faster.

marshall
  • 2,443
  • 7
  • 25
  • 45
  • it can be done with 2D BIT if x and y co-ordinates of the vertices of the rectangle are not large (ie: within 10^3) – Fallen Nov 12 '13 at 14:58
  • @Fallen They are at most 65535. – marshall Nov 12 '13 at 15:01
  • @marshall, a loop of 65535 simple `if`s would take roughly ~1-10ms. While it's a good idea to pursuit better algorithms, can you first verify that this time is too big for you? – Shahbaz Nov 12 '13 at 15:09
  • @Shahbaz - coordinates are at most 65k, making bit mask lookup impractical (512M if using one bit per point). – Floris Nov 12 '13 at 15:13
  • @pablochan I will do 10-100 million queries and I have about a million rectangles. – marshall Nov 12 '13 at 15:25

4 Answers4

6

The answer depends a little bit on how many rectangles you have. The brute force method checks your coordinates against each rectangular pair in turn:

found = false
for each r in rectangles:
  if point.x > r.x1 && point.x < r.x2:
    if point.y > r.y1 && point.y < r.y2
      found = true
      break

You can get more efficient by sorting the rectangles into regions, and looking at "bounding rectangles". You then do a binary search through a tree of ever-decreasing bounding rectangles. This takes a bit more work up front, but it makes the lookup O(ln(n)) rather than O(n) - for large collections of rectangles and many lookups, the performance improvement will be significant. You can see a version of this (which looks at intersection of a rectangle with a set of rectangles - but you easily adapt to "point within") in this earlier answer. More generally, look at the topic of quad trees which are exactly the kind of data structure you would need for a 2D problem like this.

A slightly less efficient (but faster) method would sort the rectangles by lower left corner (for example) - you then need to search only a subset of the rectangles.

If the coordinates are integer type, you could make a binary mask - then the lookup is a single operation (in your case this would require a 512MB lookup table). If your space is relatively sparsely populated (i.e. the probability of a "miss" is quite large) then you could consider using an undersampled bit map (e.g. using coordinates/8) - then map size drops to 8M, and if you have "no hit" you save yourself the expense of looking more closely. Of course you have to round down the left/bottom, and round up the top/right coordinates to make this work right.

Expanding a little bit with an example:

Imagine coordinates can be just 0 - 15 in x, and 0 - 7 in y. There are three rectangles (all [x1 y1 x2 y2]: [2 3 4 5], [3 4 6 7] and [7 1 10 5]. We can draw these in a matrix (I mark the bottom left hand corner with the number of the rectangle - note that 1 and 2 overlap):

...xxxx.........
...xxxx.........
..xxxxx.........
..x2xxxxxxx.....
..1xx..xxxx.....
.......xxxx.....
.......3xxx.....
................

You can turn this into an array of zeros and ones - so that "is there a rectangle at this point" is the same as "is this bit set". A single lookup will give you the answer. To save space you could downsample the array - if there is still no hit, you have your answer, but if there is a hit you would need to check "is this real" - so it saves less time, and savings depend on sparseness of your matrix (sparser = faster). Subsampled array would look like this (2x downsampling):

.oxx....
.xxooo..
.oooxo..
...ooo..

I use x to mark "if you hit this point, you are sure to be in a rectangle", and o to say "some of these are a rectangle". Many of the points are now "maybe", and less time is saved. If you did more severe downsampling you might consider having a two-bit mask: this would allow you to say "this entire block is filled with rectangles" (i.e. - no further processing needed: the x above) or "further processing needed" (like the o above). This soon starts to be more complicated than the Q-tree approach...

Bottom line: the more sorting / organizing of the rectangles you do up front, the faster you can do the lookup.

Community
  • 1
  • 1
Floris
  • 45,857
  • 6
  • 70
  • 122
  • A binary search based solution would be perfect if this got the time down to O(log(n)) . Can you explain the integer mask idea more please? – marshall Nov 12 '13 at 15:01
  • @marshall - Updated with link to binary algorithm. – Floris Nov 12 '13 at 15:07
  • If each row has 2^16 bits and there are 2^16 rows don't we end up using 2^32 space this way? – marshall Nov 12 '13 at 16:21
  • @marshall - yes. At 1 bit per position, that's 2^24 bytes, or 256 MB. Large, but not impossible. Given the number of rectangles and lookups you envisage, it may well be the most efficient (but note that things won't be "super" fast because your array will not be in cache.) – Floris Nov 12 '13 at 19:23
  • Sorry to be dim but why is it 2^24 and not 2^32? – marshall Nov 12 '13 at 19:44
  • Sorry having a soggy brain moment. Bottom line - it's bits vs bytes. Proposing to store information "one bit per point". with 2^16 to the side, it is not 2^32 bytes, but 2^32 bits. But that is still 2^29 bytes - so 512 MB (MegaBytes). Which is what I had originally said before I second-guessed myself and said 256 (no idea where that number came from). See comment I made earlier today, below your question. – Floris Nov 12 '13 at 19:49
0

Store the coordinate parts of your rectangles to a tree structure. For any left value make an entry that points to corresponding right values pointing to corresponding top values pointing to corresponding bottom values.

To search you have to check the x value of your point against the left values. If all left values do not match, meaning they are greater than your x value, you know the point is outside any rectangle. Otherwise you check the x value against the right values of the corresponding left value. Again if all right values do not match, you're outside. Otherwise the same with top and bottom values. Once you find a matching bottom value, you know you are inside of any rectangle and you are finished checking.

As I stated in my comment below, there are much room for optimizations, for example minimum left and top values and also maximum right and botom values, to quick check if you are outside.

The following approach is in C# and needs adaption to your preferred language:

public class RectangleUnion
{
    private readonly Dictionary<int, Dictionary<int, Dictionary<int, HashSet<int>>>> coordinates =
        new Dictionary<int, Dictionary<int, Dictionary<int, HashSet<int>>>>();

    public void Add(Rectangle rect)
    {
        Dictionary<int, Dictionary<int, HashSet<int>>> verticalMap;

        if (coordinates.TryGetValue(rect.Left, out verticalMap))
            AddVertical(rect, verticalMap);
        else
            coordinates.Add(rect.Left, CreateVerticalMap(rect));
    }

    public bool IsInUnion(Point point)
    {
        foreach (var left in coordinates)
        {
            if (point.X < left.Key) continue;

            foreach (var right in left.Value)
            {
                if (right.Key < point.X) continue;

                foreach (var top in right.Value)
                {
                    if (point.Y < top.Key) continue;

                    foreach (var bottom in top.Value)
                    {
                        if (point.Y > bottom) continue;

                        return true;
                    }
                }
            }
        }

        return false;
    }

    private static void AddVertical(Rectangle rect,
        IDictionary<int, Dictionary<int, HashSet<int>>> verticalMap)
    {
        Dictionary<int, HashSet<int>> bottomMap;
        if (verticalMap.TryGetValue(rect.Right, out bottomMap))
            AddBottom(rect, bottomMap);
        else
            verticalMap.Add(rect.Right, CreateBottomMap(rect));
    }

    private static void AddBottom(
        Rectangle rect,
        IDictionary<int, HashSet<int>> bottomMap)
    {
        HashSet<int> bottomList;
        if (bottomMap.TryGetValue(rect.Top, out bottomList))
            bottomList.Add(rect.Bottom);
        else
            bottomMap.Add(rect.Top, new HashSet<int> { rect.Bottom });
    }

    private static Dictionary<int, Dictionary<int, HashSet<int>>> CreateVerticalMap(
        Rectangle rect)
    {
        var bottomMap = CreateBottomMap(rect);
        return new Dictionary<int, Dictionary<int, HashSet<int>>>
                   {
                       { rect.Right, bottomMap }
                   };
    }

    private static Dictionary<int, HashSet<int>> CreateBottomMap(Rectangle rect)
    {
        var bottomList = new HashSet<int> { rect.Bottom };
        return new Dictionary<int, HashSet<int>>
                   {
                       { rect.Top, bottomList }
                   };
    }
}

It's not beautiful, but should point you in the right direction.

abto
  • 1,583
  • 1
  • 12
  • 31
  • Would you be able to explain what it is doing in English maybe? I don't know C#. – marshall Nov 12 '13 at 16:29
  • Can you explain how this is "fast"? – Floris Nov 12 '13 at 19:28
  • It's hard to write it in "English" so i phrased it in C#, but you can imagine as a sort of pseudo code. But I'll try to rework the code to be more self explanatory. The speed of this alogorithm lays in the tee structure of the coordinate storage. I know, there could be many drawbacks like any rectangle creates a single path or the point is at the rightmost bottom etc. but the more likely is that the point is somewhere inside. There is much room for optimization. Also this approach is only usefull if you whant to check many points against the union. – abto Nov 13 '13 at 00:38
0

My favourite for a variety of 2D geometry queries is Sweep Line Algorithm. It's widely utilize in CAD software, which would be my wild guess for the purpose of your program.

Basically, you order all points and all polygon vertices (all 4 rectangle corners in your case) along X-axis, and advance along X-axis from one point to the next. In case of non-Manhattan geometries you would also introduce intermediate points, the segment intersections.

The data structure is a balanced tree of the points and polygon (rectangle) edge intersections with the vertical line at the current X-position, ordered in Y-direction. If the structure is properly maintained it's very easy to tell whether a point at the current X-position is contained in a rectangle or not: just examine Y-orientation of the vertically adjacent to the point edge intersections. If rectangles are allowed to overlap or have rectangle holes it's just a bit more complicated, but still very fast.

The overall complexity for N points and M rectangles is O((N+M)*log(N+M)). One can actually prove that this is asymptotically optimal.

Michael
  • 5,775
  • 2
  • 34
  • 53
0

Identify your rectangles by integers (0..n-1). Use two interval trees. One holding the dx (x_min, x_max) of each rectangle: itvx. The other one the dy of each rectangle: itvy. Let's call q your query 2D point. query itvx with q.x -> return a set of integers (potentially hit rectangles) query itvy with q.y -> return another set of integers. The intersection of those two integer sets is the set of hit rectangles.

daruma
  • 197
  • 4
  • 8