Finding repeated pattern in a set of points in the plane

Question

Consider the following problem: You have data that describes a rectangular Printed-Circuit-Board (PCB), for example a list of polygons in the plane with type information (that can be represented by a color scheme).

The PCB has a repeated pattern structure, that is there exists a rectangle R0 and vectors v1,...,vN. These typical form a finite lattice, that is they are

p w1 + q w2 + w0

with integers 0 <= p <= P and 0 <= q <= Q and w1, w2 orthogonal and parallel to the coordinate axes.

The PCB consists of repetitions of the part covered by R0, which is repeated in the translated rectangles vi + R0 (the vi + R0 are pairwise non-overlapping, of course).

Now the task is, to recover from a description of the whole PCB the tile rectangle R0 and the translations vi (or to put it otherwise, to find the list of rectangles vi + R0).

I simplified the problem a bit, by taking from each polygon pi a point P(pi) from the plane R^2 such that, if the polygon pj is a translation v + pi of a polygon pi, then P(pj) = v + P(pi).

Also every point occurring as such P(pi) is to be considered "colored" (= marked by an integer number) such that equally colored polygons create equally colored points.

So you have as initial data a set (p1, c1),...., (pM, cM) of colored points in the plane R^2 and again your task is to find the repeated tile and its translation vectors.

Of course the problem is not wholly well defined, if posed this way: If you have e.g a 10 x 20 grid of copies of R0, you could also view it as a 5 x 10 grid of a square made up of four copies of R0 or as a 5 x 20 grid of two copies of R0 above each other, and so on.

So add the assumption, that R0 itself should not be made from a lattice in the above sense by a smaller tile R01, part of R0.

The algorithm I currently use is the following:

Choose a so called "seed rectangle" Rs in the following way:

1a) Put a grid G of size (gx, gy) over the whole arrangement of points. Initialize each grid point with the number 0.

1b) For every point p from the total point set, find its gridpoint (px, py).

1c) In the square [px - d, px + d] x [py - d, py + d] of grid points increase the values of the grid points by 1.

(So 1a-c are a "smear out" of the points, snapped onto a grid).

1d) Find the point (ix, iy) in the grid G with maximal value valmax and consider the list L of points in the grid with at least 90% (or 95%, this is a parameter one can play with) of this value. Select from this list the point p0 = (ix0, iy0) which is lexicographically most upward and left.

1e) Form a "blob" around p0 which consists of the connected component of all grid points with values above 50% of valmax which contains p0. Take the enclosing rectangle of this connected component.

1f) Take this rectangle as seed rectangle Rs.

Find all occurrences of the seed rectangle in the total point set, by doing a form of cross-correlation.

This gives a list of assumed translation vectors v1,...,vN

Now go back to the seed rectangle Rs and try iteratively to enlarge it to a rectangle Rs' under the side-condition, that all translations vi + Rs' must still be present in the total point set.

If this enlargement stops, the current Rs' is assumed to be R0.

This algorithm works on many examples, for example on this one

The red rectangles are the translations of the seed rectangle Rs from above. The blue rectangle is the found R0.

The grid where the points are smeared out (Step 1 from above) looks like

(Here points above 50% of the maximal value are red, below 25% of the maximal value are yellow).

But here it fails

Of course a tiling into 2 x 6 tiles should be found, but instead the seed rectangle is found 192 times, 16 times repeated inside the actual tile. Consequently the correct actual tile is not found, because the enlargement process (Step 3 above) fails too early.

I abstracted this a bit in the following picture

Here filled rectangles of identical size and color should stand for identical point patterns. The intended tiling is of course a 2 x 3 tiling.

It makes clear that in a certain sense, all attempts to find a good "seed rectangle" from analyzing only parts of the whole image (to save computing time) are "doomed". So, if one analyzes in the above picture the content of the rectangle A with the black border, one would choose for example the seed rectangle surrounded by B and find a lot of repetitions of this downwards - most of these unwanted and leading to a failure of the algorithm in Step 3. A seed rectangle surrounded by C would be ok.

So my question is: What could be a better algorithm for finding the rectangle R0 or, alternatively, a seed rectangle Rs from which R0 will always be found correctly?

Remark: Simply doing a cross-correlation of the whole image with itself unfortunately does not always solve the problem. In the failing example above, a small downward shift will typically have a greater correlation value with the image than a comparatively larger shift which moves the first row onto the second row, but the last one is that which should be found.

Any vertical or horizontal slice across the image (with a certain thickness, and sufficient variation) will give a periodic signal. You can find the period by autocorrelation. — , Jul 12 '22 at 19:19
in all the cross/auto/self/*-correlation stuff you need to mess around with the math. simple multiplication assumes no DC component, which is a faulty assumption here. you need something that uses *differences* to evaluate the fit. and you should consider a multi-scale approach. that captures coarse structure, guiding the finer levels properly. — Christoph Rackwitz, Jul 12 '22 at 22:26
you could also try something based on features and feature matching. in that 3x3 tiled example, I'd expect one feature to match _many_ very similarly, and the positions of those would form a grid... -- also I've had this in the clipboard for a while, perhaps have a look: https://stackoverflow.com/questions/62946604/fitting-an-orthogonal-grid-to-noisy-coordinates — Christoph Rackwitz, Jul 12 '22 at 22:28
@ChristophRackwitz I currently do the autocorrelation of point sets si and tj by forming all differences vij = tj - si and mapping the vij into a raster grid, increasing the value of the grid point correspoding to vij by 1. This is equivalent to forming the g * f = ifft(fft(g) bar(fft(f))) = sum_ij delta(x- (tj -si)) for f = sum_i=1^m delta(x - si) and g = sum_j=1^n delta(x - tj) where delta(x) is the Dirac-delta function. Do I understand your comment right, that you propose to take instead f1 = f - m and g1 = g - n to get g1 * f1 = f * g - m g - n fmirr + m n where fmirr = sum_i delta(x+si)? — Jürgen Böhm, Jul 13 '22 at 15:22
this site seriously needs inline mathjax rendering capabilities. or maybe it's better this way. math is more of a code than source code is. I can't decipher what you're asking. part of it looks like convolution, but writing those sums there does the *opposite* of clarify. and those m,n aren't defined so I have no clue what subtracting them would accomplish. — Christoph Rackwitz, Jul 13 '22 at 15:28
@ChristophRackwitz g * f is meant as convolution of g and f. The f = sum_i=1^m delta(x - si) for a group of points s1,..,sm in the plane is the sum of the delta functions supported at the si, x is the coordinate in the plane. The equality g * f = sum_ij delta(x - (tj - si)) can be easily calculated. The m and n are the numbers of si and tj respectively (see the sum decorations). To subtract them gives functions of average 0 - at least this was the idea, but now I notice that this does not seem to be true. So back to the drawing board. — Jürgen Böhm, Jul 13 '22 at 15:32
I think all this excessive math formulation (I'm guessing you were subjected to that in Germany) obscures the problem. it's an implementation detail. you're dealing with a set of rectangles of various sizes, and you want to find similar/equal rectangles, and then determine if they (or a subset of them) can be interpreted to form a grid. given some data, this problem could be explored... -- correlation/convolution approaches suppose that you've rendered that vector description into a raster description. and I don't see that to be *necessary* when you can work on the vector data. — Christoph Rackwitz, Jul 13 '22 at 15:41
say, would this problem be aptly described as "detecting panels"? — Christoph Rackwitz, Jul 13 '22 at 15:43
In fact we speak of "images" (that are contained in "panels") but essentially you are right. — Jürgen Böhm, Jul 13 '22 at 15:45

Finding repeated pattern in a set of points in the plane

0 Answers0