Consider the following problem: You have data that describes a rectangular Printed-Circuit-Board (PCB), for example a list of polygons in the plane with type information (that can be represented by a color scheme).
The PCB has a repeated pattern structure, that is there exists a rectangle R0 and vectors v1,...,vN. These typical form a finite lattice, that is they are
p w1 + q w2 + w0
with integers 0 <= p <= P and 0 <= q <= Q and w1, w2 orthogonal and parallel to the coordinate axes.
The PCB consists of repetitions of the part covered by R0, which is repeated in the translated rectangles vi + R0 (the vi + R0 are pairwise non-overlapping, of course).
Now the task is, to recover from a description of the whole PCB the tile rectangle R0 and the translations vi (or to put it otherwise, to find the list of rectangles vi + R0).
I simplified the problem a bit, by taking from each polygon pi a point P(pi) from the plane R^2 such that, if the polygon pj is a translation v + pi of a polygon pi, then P(pj) = v + P(pi).
Also every point occurring as such P(pi) is to be considered "colored" (= marked by an integer number) such that equally colored polygons create equally colored points.
So you have as initial data a set (p1, c1),...., (pM, cM) of colored points in the plane R^2 and again your task is to find the repeated tile and its translation vectors.
Of course the problem is not wholly well defined, if posed this way: If you have e.g a 10 x 20 grid of copies of R0, you could also view it as a 5 x 10 grid of a square made up of four copies of R0 or as a 5 x 20 grid of two copies of R0 above each other, and so on.
So add the assumption, that R0 itself should not be made from a lattice in the above sense by a smaller tile R01, part of R0.
The algorithm I currently use is the following:
- Choose a so called "seed rectangle" Rs in the following way:
1a) Put a grid G of size (gx, gy) over the whole arrangement of points. Initialize each grid point with the number 0.
1b) For every point p from the total point set, find its gridpoint (px, py).
1c) In the square [px - d, px + d] x [py - d, py + d] of grid points increase the values of the grid points by 1.
(So 1a-c are a "smear out" of the points, snapped onto a grid).
1d) Find the point (ix, iy) in the grid G with maximal value valmax and consider the list L of points in the grid with at least 90% (or 95%, this is a parameter one can play with) of this value. Select from this list the point p0 = (ix0, iy0) which is lexicographically most upward and left.
1e) Form a "blob" around p0 which consists of the connected component of all grid points with values above 50% of valmax which contains p0. Take the enclosing rectangle of this connected component.
1f) Take this rectangle as seed rectangle Rs.
- Find all occurrences of the seed rectangle in the total point set, by doing a form of cross-correlation.
This gives a list of assumed translation vectors v1,...,vN
- Now go back to the seed rectangle Rs and try iteratively to enlarge it to a rectangle Rs' under the side-condition, that all translations vi + Rs' must still be present in the total point set.
If this enlargement stops, the current Rs' is assumed to be R0.
This algorithm works on many examples, for example on this one
The red rectangles are the translations of the seed rectangle Rs from above. The blue rectangle is the found R0.
The grid where the points are smeared out (Step 1 from above) looks like
(Here points above 50% of the maximal value are red, below 25% of the maximal value are yellow).
But here it fails
Of course a tiling into 2 x 6 tiles should be found, but instead the seed rectangle is found 192 times, 16 times repeated inside the actual tile. Consequently the correct actual tile is not found, because the enlargement process (Step 3 above) fails too early.
I abstracted this a bit in the following picture
Here filled rectangles of identical size and color should stand for identical point patterns. The intended tiling is of course a 2 x 3 tiling.
It makes clear that in a certain sense, all attempts to find a good "seed rectangle" from analyzing only parts of the whole image (to save computing time) are "doomed". So, if one analyzes in the above picture the content of the rectangle A with the black border, one would choose for example the seed rectangle surrounded by B and find a lot of repetitions of this downwards - most of these unwanted and leading to a failure of the algorithm in Step 3. A seed rectangle surrounded by C would be ok.
So my question is: What could be a better algorithm for finding the rectangle R0 or, alternatively, a seed rectangle Rs from which R0 will always be found correctly?
Remark: Simply doing a cross-correlation of the whole image with itself unfortunately does not always solve the problem. In the failing example above, a small downward shift will typically have a greater correlation value with the image than a comparatively larger shift which moves the first row onto the second row, but the last one is that which should be found.