Finding all cycles/enclosed shapes in a 2D grid

Question

I have an "infinite" 2D grid and I want to detect closed/complete "structures" - areas of any shape which are enclosed on all sides. However, I need to identify each individual closed circuit - including the larger shape, if any.

In researching this, I've discovered the cycle detection algorithm, but I don't see a clean/efficient way to separate the larger circuit from the smaller ones.

For example given the following two "complete" structures:

0 1 1 1 0
0 1 0 1 0
0 1 1 1 0
0 0 0 0 0
0 1 1 1 1 1
0 1 0 1 0 1
0 1 1 1 1 1

The first is a single cell enclosed by 8 "walls". The cycle detection makes it trivial to detect this.

The second example consists of two copies of example one but they share a wall. There are three separate circuits I care about - the left room, the right room, and the overall structure.

Multiple passes of a cycle algorithm might work, but I'd have to be sure I'm not retracing an already-found shape.

I've also looked at the flood fill algorithm, but it seems like it makes the assumption you already know a point inside the bounded area. With an infinite 2D grid I'd need a size limit to force it to give up if it's not in a valid structure.

Are there solutions I'm missing or have I missed something with my thinking?

I will only do this "check" when a boundary value is added. Using the example above, if I change any 0 -> 1, a new cycle has potentially been created and I'll run the logic. I do not care about identifying separate structures and will always have an origin coordinate.

I've been studying the solutions posted here but they're all based on already knowing which nodes are connected to other nodes. I've already toyed with logic that identifies each individual "line" and I can keep going from there, but it feels redundant.

How is it infinite? I mean, if there are a square of 1's off to the right a million places, how do you know? Are you just storing the location of the 1's? — Kevin, Jul 07 '17 at 21:18
By infinite, I mean these "structures" can be any shape, a reasonable size, and anywhere on an "infinite" 2D grid - which in this case is a procedurally-generated 2D tile map. — helion3, Jul 07 '17 at 21:24
Have a look at this excellent paper: "A linear-time component-labeling algorithm using contour tracing technique" by Fu Chang, Chun-Jen Chen, and Chi-Jen Lu. I guess that you can adapt the method to the case of an infinite grid by spiraling around a starting pixel for the initial scan. — , Jul 07 '17 at 21:27
Okay, let me phrase it this way. If there's a structure off a million places to the right, do you care about it? Is there a specific bounding box you want to enumerate the structures of, or do you want them all regardless of distance away? — Kevin, Jul 07 '17 at 21:30
@Kevin I should have mentioned that, sorry. I will only do this "check" when a boundary value is added. So using my example above, if someone changes a 0 -> 1 I will know that a new cycle has *potentially* been created and I'll run the logic. If someone set a random 1, no cycle exists. If someone just completed a square, a cycle exists. I do not care about identifying separate structures and will always have an origin coordinate. — helion3, Jul 07 '17 at 21:36
What is a closed structure? Multiple connected 1s surrounding one or more 0s? If the three 0s on the bottom of the column to the very left would be 1s, would we get 6 additional circuits? — maraca, Jul 12 '17 at 01:58
This question cannot be answered with so many details missing. See my comment 2 days ago. You don't even define what you mean by structure, circuit etc. If you count each unique circuit the number will explode. — maraca, Jul 14 '17 at 08:03

Spektre · Answer 1 · 2017-07-12T09:03:53.733

5

I would do this like this:

0 0 0 0 0 0 0
0 1 1 1 1 1 0
0 1 0 1 0 1 0
0 1 1 1 1 1 0
0 0 0 0 0 0 0

fill the background with 2

to determine if you are in background just cast a ray and count consequent zeores. Once you find location where the ray length is bigger then circuit size limit you got your start point.
```
[0]0-0-0-0-0-0
 0 1 1 1 1 1 0
 0 1 0 1 0 1 0
 0 1 1 1 1 1 0
 0 0 0 0 0 0 0

 2 2 2 2 2 2 2
 2 1 1 1 1 1 2
 2 1 0 1 0 1 2
 2 1 1 1 1 1 2
 2 2 2 2 2 2 2
```
Do not use unbound recursive flood fills for this !!! because for "infinite" area you will stack overflow. You can limit the recursion level and if reached instead of recursion add point to some que for further processing latter. This usually speeds thing up a lot and limits the stack usage...

find first 0

 2 2 2 2 2 2 2
 2 1 1 1 1 1 2
 2 1[0]1 0 1 2
 2 1 1 1 1 1 2
 2 2 2 2 2 2 2

flood fill it with 3

 2 2 2 2 2 2 2
 2 1 1 1 1 1 2
 2 1 3 1 0 1 2
 2 1 1 1 1 1 2
 2 2 2 2 2 2 2

select all 1 near 3

this is your circuit. If you remember the bbox while filling #3 then you need to scan only area enlarged by one cell on each side... Selected cells are your circuit.
```
 2 2 2 2 2 2 2
 2 * * * 1 1 2
 2 * 3 * 0 1 2
 2 * * * 1 1 2
 2 2 2 2 2 2 2
```

flood fill 3 with 2

this will avoid of usage already processed circuits

 2 2 2 2 2 2 2
 2 1 1 1 1 1 2
 2 1 2 1 0 1 2
 2 1 1 1 1 1 2
 2 2 2 2 2 2 2

loop #2 while any 0 found

change all 2 back to 0

 0 0 0 0 0 0 0
 0 1 1 1 1 1 0
 0 1 0 1 0 1 0
 0 1 1 1 1 1 0
 0 0 0 0 0 0 0

edited Jul 12 '17 at 09:03

answered Jul 12 '17 at 08:56

Spektre

49,595
11
110
380

OP states that object that contains both holes also need to be found: *"The second example consists of two copies of example one but they share a wall. There are three separate circuits I care about - the left room, the right room, and the overall structure."* Can you find external boundary with this approach? – Leonid Vasilev Jul 17 '17 at 15:46
1

@LeonidVasilyev external boundary is easy just select all the `1` neighboring `2` before bullet `#2` that will select all the outer circuits. But there is the need to avoid duplicate circuits latter on ... – Spektre Jul 17 '17 at 19:32
This is an interesting method but I don't believe it's right for me. What's key for me is that I will always have a starting point which I know is already somewhere along the circuit. Given that, it's more efficient to follow the path than to flood fill, partly because flood fills will always touch more cells than following the path has to, especially since you advise multiple fills. Judging by the steps listed it appears to be more complex and costly than the current solution. – helion3 Jul 20 '17 at 23:01
@helion3 the only expensive fill is the background and if you change your encoding that it has different code that the holes then you do not need it. By following edges you can miss the inner circuits unless all edge cases are handled properly. On the other hand you can use flood fill of new edge to get its bbox to speed up the scanning process considerably. So this can be done iiterativelly with complexity depending only on the area of newly added circuit. If you just change a circuit then you have to first remove the old one from the found list of circuits. – Spektre Jul 21 '17 at 06:44

Leonid Vasilev · Answer 2 · 2017-07-13T15:48:12.013

It is a contours finding problem.

One of possible algorithms is described by Satoshi Suzuki and Keiichi Abe in their paper called Topological Structural Analysis of Digitized Binary Images by Border Following in 1985. And it is not trivial. But you can use OpenCV, it's cv2.findContours() function implements this algorithm.

If you choose to use OpenCV, the solution is easy. You extract contours alongside it's hierarchy. Contours that has at least one child (hole) and their child contours are objects that you are looking for. Example using managed OpenCV wrapper called OpenCvSharp:

byte[,] a = new byte[7, 6]
{
    { 0, 1, 1, 1, 0, 0 },
    { 0, 1, 0, 1, 0, 0 },
    { 0, 1, 1, 1, 0, 0 },
    { 0, 0, 0, 0, 0, 0 },
    { 0, 1, 1, 1, 1, 1 },
    { 0, 1, 0, 1, 0, 1 },
    { 0, 1, 1, 1, 1, 1 }
};
// Clone the matrix if you want to keep original array unmodified.
using (var mat = new MatOfByte(a.GetLength(0), a.GetLength(1), a))
{
    // Turn 1 pixel values into 255.
    Cv2.Threshold(mat, mat, thresh: 0, maxval: 255, type: ThresholdTypes.Binary);
    // Note that in OpenCV Point.X is a matrix column index and Point.Y is a row index.
    Point[][] contours;
    HierarchyIndex[] hierarchy;
    Cv2.FindContours(mat, out contours, out hierarchy, RetrievalModes.CComp, ContourApproximationModes.ApproxNone);
    for (var i = 0; i < contours.Length; ++i)
    {
        var hasHole = hierarchy[i].Child > -1;
        if (hasHole)
        {
            var externalContour = contours[i];
            // Process external contour.
            var holeIndex = hierarchy[i].Child;
            do
            {
                var hole = contours[holeIndex];
                // Process hole.
                holeIndex = hierarchy[holeIndex].Next;
            }
            while (holeIndex > -1);
        }
    }
}

score 2 · Answer 3 · edited Sep 13 '17 at 22:24

Coming from a graph-theoretic view of the problem, you can interpret every 0 of your map as a node, neighboring 0s are connected with an edge. It sounds to me like what you want to do is compute the connected components of this graph (and maybe their connectivity by 1 values, to find 'neighboring rooms' of the same structure)

If you only want to compute this information once, a straightforward approach using a union-find data structure should suffice, where you apply union once per edge.

If you want to edit your map dynamically, the best approach based on the graph model would probably be some dynamic data structure that supports split or de-union operations, see for example here or here

score 2 · Answer 4 · edited Jul 26 '17 at 10:46

You can try a list of points and verify the ones that are linked.

class PointList : List<Point>
{
    /// <summary>
    /// Adds the point to the list and checks for perimeters
    /// </summary>
    /// <param name="point"></param>
    /// <returns>Returns true if it created at least one structure</returns>
    public bool AddAndVerify(Point point)
    {
        this.Add(point);

        bool result = LookForPerimeter(point, point, point);
        Console.WriteLine(result);
        return result;
    }

    private bool LookForPerimeter(Point point, Point last, Point original)
    {
        foreach (Point linked in this.Where(p => 
            (p.X == point.X -1 && p.Y == point.Y)
            || (p.X == point.X + 1 && p.Y == point.Y)
            || (p.X == point.X && p.Y == point.Y - 1)
            || (p.X == point.X && p.Y == point.Y + 1)
        ))
        {
            if (!linked.Equals(last))
            {
                if (linked == original) return true;

                bool subResult = LookForPerimeter(linked, point, original);
                if (subResult) return true;
            }
        }

        return false;
    }
}

This code is intentended as a starting point, it probably has bugs and does not account for perimeters without 0 inside

Example of use:

class Program
{
    static void Main(string[] args)
    {
        PointList list = new PointList();

        list.AddAndVerify(new Point() { X = 0, Y = 0 }); //returns false
        list.AddAndVerify(new Point() { X = 0, Y = 1 }); //returns false
        list.AddAndVerify(new Point() { X = 0, Y = 2 }); //returns false
        list.AddAndVerify(new Point() { X = 1, Y = 2 }); //returns false
        list.AddAndVerify(new Point() { X = 2, Y = 2 }); //returns false
        list.AddAndVerify(new Point() { X = 2, Y = 1 }); //returns false
        list.AddAndVerify(new Point() { X = 2, Y = 0 }); //returns false
        list.AddAndVerify(new Point() { X = 1, Y = 0 }); //returns True
    }
}

score 1 · Answer 5 · answered Jul 18 '17 at 17:46

I had a similar problem trying to find all circles inside a 2D street map graph (given as a SVG file). As you state I too could not find an algorithm for that.

I found the following solution though.

Assumptions

Grid Layout: Each '1' in the grid is in one of the following states (or an homomorphism of that):

1.   0      2.   0      3.   0      4.   0      5.   0      6.   1
   0 1 0       1 1 0       1 1 0       1 1 1       1 1 1       1 1 1
     0           0           1           0           1           1

But only example 3 to 6 make a sense for a connected wall, as in a connected wall each '1' has at least two '1's in its neighborhood.

Example 3 indicates a corner. This corner holds at most one structure.
Example 4 indicates a straight line. It can be the wall of zero, one or two structures.
Example 5 indicates a t-wall. It can be the wall of zero, one, two or three structures.
Example 6 indicates a cross-wall. It can be the corner of zero, one, two, three or four structures.

The Algorithm

The idea

Assuming the above, the algorithm works by finding a '1' and doing a depth first search, to mark all connected '1's. The traversed '1's are only marked, if the depth first search arrives at the starting position or at an already marked position.

The implementation

I will post an implementation the next few days for that one.

score 0 · Answer 6 · answered Jul 20 '17 at 23:16

Re-posting my solution with an explanation and some code.

It took a few days before any answers were posted I tried to find a solution and believe I've found one that works very well for my needs.

Since I always have a starting point, I walk the edges from that point and fork the list of visited points each time the path "branches" off - allowing me to find multiple cycles.

Given a 2D grid with either a 1 or 0 in a cell:

0 1 1 1 1 1
0 1 0 1 0 1
0 1 1 1 1 1

Starting from a cell I already know is a 1, I begin my search:

For the current valid point:
1. add it to a "visited" list
2. look for any valid neighbors (except for last point I visited, to avoid infinite loops)
For each valid neighbor:
1. clone the list of points which is our "trail" to this new point
2. call step 1 with the neighbor point

Cloning allows each "branch" to become a unique cycle without mixing points.

I haven't run any performance profiling, but it works very well given the examples I've thrown at it.

It's possible to give me two copies of a cycle. For example, if I start in the NW corner, cells to the east and south both have valid paths to follow. They're both treated as new paths and followed, but they're just mirror images of the same cycle. For now, I just prune cycles like these - they have exactly the same points, as long as you ignore their order.

There's also a bit of filtering involved - like for problem #1 and trimming points if the end point matches a visited point that wasn't where we started. I think that's pretty much unavoidable and isn't a big deal but if there was a clean way to avoid that I would. I can't know what "begins" a new cycle until I've found it though, so you know, linear time flow strikes again.

public class CycleDetection {
    // Cache found cycles
    List<Cycle> cycles = new List<Cycle>();

    // Provide public readonly access to our cycle list
    public ReadOnlyCollection<Cycle> Cycles {
        get { return new ReadOnlyCollection<Cycle>(cycles); }
    }

    // Steps/slopes that determine how we iterate grid points
    public Point[] Steps = new Point[] {
        new Point(1, 0),
        new Point(0, 1),
        new Point(-1, 0),
        new Point(0, -1)
    };

    // Cache our starting position
    Point origin;

    // Cache the validation function
    Func<Point, bool> validator;

    public CycleDetection(Point origin, Func<Point, bool> validator) {
        this.origin = origin;
        this.validator = validator;

        this.Scan();
    }

    // Activate a new scan.
    public void Scan() {
        cycles.Clear();

        if (validator(origin)) {
            Scan(new List<Point>(), origin);
        }
    }

    // Add a cycle to our final list.
    // This ensures the cycle doesn't already exist (compares points, ignoring order).
    void AddCycle(Cycle cycle) {
        // Cycles have reached some existing point in the trail, but not necessarily
        // the exact starting point. To filter out "strands" we find the index of
        // the actual starting point and skip points that came before it
        var index = cycle.Points.IndexOf(cycle.Points[cycle.Points.Count - 1]);

        // Make a new object with only the points forming the exact cycle
        // If the end point is the actual starting point, this has no effect.
        cycle = new Cycle(cycle.Points.Skip(index).ToList());

        // Add unless duplicate
        if (!cycles.Contains(cycle)) {
            cycles.Add(cycle);
        }
    }

    // Scan a new point and follow any valid new trails.
    void Scan(List<Point> trail, Point start) {
        // Cycle completed?
        if (trail.Contains(start)) {
            // Add this position as the end point
            trail.Add(start);

            // Add the finished cycle
            AddCycle(new Cycle(trail));

            return;
        }

        trail.Add(start);

        // Look for neighbors
        foreach (var step in Steps) {
            var neighbor = start + step;

            // Make sure the neighbor isn't the last point we were on... that'd be an infinite loop
            if (trail.Count >= 2 && neighbor.Equals(trail[trail.Count - 2])) {
                continue;
            }

            // If neighbor is new and matches
            if (validator(neighbor)) {
                // Continue the trail with the neighbor
                Scan(new List<Point>(trail), neighbor);
            }
        }
    }
}

I've posted the full source here: https://github.com/OutpostOmni/OmniGraph (includes some unrelated graph utils as well)

Finding all cycles/enclosed shapes in a 2D grid

6 Answers6

Assumptions

The Algorithm

The idea

The implementation