15

I'm looking for an algorithm to do a best fit of an arbitrary rectangle to an unordered set of points. Specifically, I'm looking for a rectangle where the sum of the distances of the points to any one of the rectangle edges is minimised. I've found plenty of best fit line, circle and ellipse algorithms, but none for a rectangle. Ideally, I'd like something in C, C++ or Java, but not really that fussy on the language.

The input data will typically be comprised of most points lying on or close to the rectangle, with a few outliers. The distribution of data will be uneven, and unlikely to include all four corners.

Charles
  • 50,943
  • 13
  • 104
  • 142
SmacL
  • 22,555
  • 12
  • 95
  • 149
  • 1
    First I wanted to offer simply finding min/max coordinates of your points, but that will only work for rectangles, parallel to either axis. I guess, you want arbitrary rotated rectangle? – David Jashi Jul 03 '13 at 09:03
  • 2
    To feed an heuristic: Are the points that are "mostly lying on or close to the rectangle" grouped (i.e. 'stretched groups' along line(s)) with small distances between them? And if so, are these distances much smaller than the dimensions of the rectangle? Also important: how many points are there more or less per rectangle side? Have a sample image by any chance? – meaning-matters Jul 03 '13 at 09:12
  • @meaning-matters - the grouping of the points is such that where they are dense, the typical distance between adjacent points will be a fraction of the side length of the rectangle. My current algorithm is to look for adjacent points, line fit them, reject outliers, line fit again, and take this as one side. Given one side, I then repeat looking for parallel lines to give me a provisional rectangle. I could possibly do a least squares variation of coordinates adjustment to precisely fit a rectangle, holding the right angles as fixed observations – SmacL Jul 03 '13 at 09:27
  • @ShaneMacLaughlin Sounds already like a good method. Are there specific issue that led you to looking for alternatives? – meaning-matters Jul 03 '13 at 09:32
  • @meaning-matters, just that a least square variation of coordinates seems like using a hammer to crack a nut, but looking at the other answers given, it could well be the best approach. – SmacL Jul 03 '13 at 09:39
  • 1
    You should specify in your question whether the rectangle can be rotated or not. – Kendall Frey Jul 03 '13 at 13:28
  • @KendallFrey I assumed it can be. – Will Ness Jul 03 '13 at 13:34
  • 1
    "the sum of the distances of the points to any one of the rectangle edges is minimised" This corresponds to the median. Is this a requirement, or does another best-fit case (such as mean) fit better? – Kendall Frey Jul 03 '13 at 13:42
  • 1
    @KendallFrey, the orientation of the rectangle is not known, and hence can be rotated. – SmacL Jul 03 '13 at 14:10
  • @KendallFrey Not quite sure what you mean in terms of median. If we consider the distance between any given point and the final rectangle to be that points residual, the best fit is given by minimizing the residuals. – SmacL Jul 03 '13 at 14:17
  • 1
    Not necessarily. Any system that aims to minimize total distance is aiming for a [median](http://en.wikipedia.org/wiki/Geometric_median). It all depends on the definition of 'best fit', which is often defined as a mean. I think least-squares is an example of a mean. – Kendall Frey Jul 03 '13 at 14:22
  • least-squares minimizes the square of the distance from the mean (or in this case, the square of the distance from the rectangle). Minimizing the distance from the mean/rectangle gives a different answer. – Teepeemm Jul 03 '13 at 14:30
  • I could come up with a true least-squares algorithm if the rectangle had sides that extended infinitely far (looking more like a tic-tac-toe grid). But this would only work in your case if you have very few points on the semi-infinite portions. How many outliers do you end up with, and is there any location where they tend to cluster? – Teepeemm Jul 04 '13 at 18:46
  • 1
    Rather than try to find a computational method for finding this I might instead go for an iterative method. I'm thinking maybe a particle filter where the rectangle guesses are the particles and solutions which minimize the distance are selected. Mutations change the size, rotation and aspect ratio of the rectangles. This would lend itself nicely to GPGPU computation, and would be pretty robust to weird point distributions. You'd want to establish a sufficient fit criteria (or just let it run until you have something else to do). – Speed8ump Jul 06 '13 at 05:22

4 Answers4

3

Here are some ideas that might help you.

We can estimate if a point is on an edge or on a corner as follows:

  1. Collect the point's n neares neighbours
  2. Calculate the points' centroid
  3. Calculate the points' covariance matrix as follows:
    1. Start with Covariance = ((0, 0), (0, 0))
    2. For each point calculate d = point - centroid
    3. Covariance += outer_product(d, d)
  4. Calculate the covariance's eigenvalues. (e.g. with SVD)
  5. Classify point:
    • if one eigenvalue is large and the other very small, we are probably on an edge
    • otherwise we should be on a corner

Extract all corner points and do a segmentation. Choose the four segments with most entries. The centroid of those segments are candidates for the rectangle's corners.

Calculate the normalized direction vectors of two opposite sides and calculate their mean. Calculate the mean of the other two opposite sides. These are the direction vectors of a parallelogram. If you want a rectangle, calculate a perpendicular vector to one of those directions and calculate the mean with the other direction vector. Then the rectangle's direction's are the mean vector and a perpendicular vector.

In order to calculate the corners, you can project the candidates on their directions and move them so that they form the corners of a rectangle.

Nico Schertler
  • 32,049
  • 4
  • 39
  • 70
1

Here's a general idea. Make a grid with smallish cells; calculate best fit line for each not-too-empty cell (the calculation is immediate1, there's no search involved). Join adjacent cells while making sure the standard deviation is improving/not worsening much. Thus we detect the four sides and the four corners, and divide our points into four groups, each belonging to one of the four sides.

Next, we throw away the corner cells, put the true rectangle in place of the four approximate lines and do a bit of hill climbing (or whatever). The calculation of best fit line may be augmented for this case, since the two lines are parallel, and we've already separated our points into the four groups (for a given rectangle, we know the delta-y between the two opposing sides (taking horizontal-ish sides for a moment), so we just add this delta-y to the ys of the lower group of points and make the calculation).

The initial rectangular grid may be replaced with working by stripes (say, vertical). Then, at least half of the stripes will have two pronounced groupings of points (find them by dividing each stripe by horizontal division lines into cells).


1For a line Y = a*X+b, minimize the sum of squares of perpendicular distances of data points {xi,yi} to that line. This is directly solvable for a and b. For more vertical lines, flip the Xs and the Ys.

P.S. I interpret the problem as minimizing the sum of squares of perpendicular distances of each point to its nearest side of the rectangle, not to all the rectangle's sides.

Will Ness
  • 70,110
  • 9
  • 98
  • 181
1

The idea of a line of best fit is to compute the vertical distances between your points and the line y=ax+b. Then you can use calculus to find the values of a and b that minimize the sum of the squares of the distances. The reason squaring is chosen over absolute value is because the former is differentiable at 0.

If you were to try the same approach with a rectangle, you would run into the problem that the square of the distance to the side of a rectangle is a piecewise defined function with 8 different pieces and is not differentiable when the pieces meet up inside the rectangle.
8 regions where the distance to a rectangle is different
In order to proceed, you'll need to decide on a function that measures how far a point is from a rectangle that is everywhere differentiable.

Teepeemm
  • 4,331
  • 5
  • 35
  • 58
  • You'll get other issues with differentiability when the points move between pieces as the rectangle is rotated. – Kendall Frey Jul 03 '13 at 14:54
  • To find a continuous approximation to the *maximum* of a set of numbers, you can raise them all to some power k, sum the resulting values, and then take the kth root of that sum. The larger k is, the closer the result will be to the maximum. To find the minimum, you can do the same thing, but first take reciprocals of each number, and finally take the reciprocal of the result. – j_random_hacker Jul 04 '13 at 10:18
0

I am not completely sure, but You might play around first 2 (3?) dimensions over the PCA from your points. it will work reasonably fast for the most cases.

Philipp_Kats
  • 3,872
  • 3
  • 27
  • 44