27

I have asked a similar question some days ago, but I have yet to find an efficient way of solving my problem. I'm developing a simple console game, and I have a 2D array like this:

1,0,0,0,1
1,1,0,1,1
0,1,0,0,1
1,1,1,1,0
0,0,0,1,0

I am trying to find all the areas that consist of neighboring 1's (4-way connectivity). So, in this example the 2 areas are as following:

1
1,1
  1
1,1,1,1
      1

and :

       1
     1,1
       1

The algorithm, that I've been working on, finds all the neighbors of the neighbors of a cell and works perfectly fine on this kind of matrices. However, when I use bigger arrays (like 90*90) the program is very slow and sometimes the huge arrays that are used cause stack overflows.

One guy on my other question told me about connected-component labelling as an efficient solution to my problem.

Can somebody show me any C++ code which uses this algorithm, because I'm kinda confused about how it actually works along with this disjoint-set data structure thing...

Thanks a lot for your help and time.

Royi
  • 4,640
  • 6
  • 46
  • 64

5 Answers5

32

I'll first give you the code and then explain it a bit:

// direction vectors
const int dx[] = {+1, 0, -1, 0};
const int dy[] = {0, +1, 0, -1};

// matrix dimensions
int row_count;
int col_count;

// the input matrix
int m[MAX][MAX];

// the labels, 0 means unlabeled
int label[MAX][MAX];

void dfs(int x, int y, int current_label) {
  if (x < 0 || x == row_count) return; // out of bounds
  if (y < 0 || y == col_count) return; // out of bounds
  if (label[x][y] || !m[x][y]) return; // already labeled or not marked with 1 in m

  // mark the current cell
  label[x][y] = current_label;

  // recursively mark the neighbors
  for (int direction = 0; direction < 4; ++direction)
    dfs(x + dx[direction], y + dy[direction], current_label);
}

void find_components() {
  int component = 0;
  for (int i = 0; i < row_count; ++i) 
    for (int j = 0; j < col_count; ++j) 
      if (!label[i][j] && m[i][j]) dfs(i, j, ++component);
}

This is a common way of solving this problem.

The direction vectors are just a nice way to find the neighboring cells (in each of the four directions).

The dfs function performs a depth-first-search of the grid. That simply means it will visit all the cells reachable from the starting cell. Each cell will be marked with current_label

The find_components function goes through all the cells of the grid and starts a component labeling if it finds an unlabeled cell (marked with 1).

This can also be done iteratively using a stack. If you replace the stack with a queue, you obtain the bfs or breadth-first-search.

Bernhard Barker
  • 54,589
  • 14
  • 104
  • 138
gus
  • 831
  • 6
  • 9
  • I got an integer stack overflow with a 512x512 grid @ this line `dfs(x + dx[direction], y + dy[direction], current_label);` – Vignesh Aug 04 '17 at 04:41
  • @Vignesh Maybe your OS is limiting your maximum stack size. If you're running linux, try doing `ulimit -s unlimited` and then running the program. Another option is to implement the search using a stack. – gus Aug 12 '17 at 13:38
  • Thanks @gus, will check it out! – Vignesh Aug 14 '17 at 02:10
15

This can be solved with union find (although DFS, as shown in the other answer, is probably a bit simpler).

The basic idea behind this data structure is to repeatedly merge elements in the same component. This is done by representing each component as a tree (with nodes keeping track of their own parent, instead of the other way around), you can check whether 2 elements are in the same component by traversing to the root node and you can merge nodes by simply making the one root the parent of the other root.

A short code sample demonstrating this:

const int w = 5, h = 5;
int input[w][h] =  {{1,0,0,0,1},
                    {1,1,0,1,1},
                    {0,1,0,0,1},
                    {1,1,1,1,0},
                    {0,0,0,1,0}};
int component[w*h];

void doUnion(int a, int b)
{
    // get the root component of a and b, and set the one's parent to the other
    while (component[a] != a)
        a = component[a];
    while (component[b] != b)
        b = component[b];
    component[b] = a;
}

void unionCoords(int x, int y, int x2, int y2)
{
    if (y2 < h && x2 < w && input[x][y] && input[x2][y2])
        doUnion(x*h + y, x2*h + y2);
}

int main()
{
    for (int i = 0; i < w*h; i++)
        component[i] = i;
    for (int x = 0; x < w; x++)
    for (int y = 0; y < h; y++)
    {
        unionCoords(x, y, x+1, y);
        unionCoords(x, y, x, y+1);
    }

    // print the array
    for (int x = 0; x < w; x++)
    {
        for (int y = 0; y < h; y++)
        {
            if (input[x][y] == 0)
            {
                cout << ' ';
                continue;
            }
            int c = x*h + y;
            while (component[c] != c) c = component[c];
            cout << (char)('a'+c);
        }
        cout << "\n";
    }
}

Live demo.

The above will show each group of ones using a different letter of the alphabet.

p   i
pp ii
 p  i
pppp 
   p 

It should be easy to modify this to get the components separately or get a list of elements corresponding to each component. One idea is to replace cout << (char)('a'+c); above with componentMap[c].add(Point(x,y)) with componentMap being a map<int, list<Point>> - each entry in this map will then correspond to a component and give a list of points.

There are various optimisations to improve the efficiency of union find, the above is just a basic implementation.

Bernhard Barker
  • 54,589
  • 14
  • 104
  • 138
  • Thanks a lot for this. It looks that it is exactly what I was trying to find! –  Jan 22 '13 at 19:58
  • One question : Isn't the int component[w*h]; part supposed to be : int component [w*y]; ? –  Jan 22 '13 at 20:18
  • @user1981497 Actually `bool input[w][y];` was supposed to be `bool input[w][h];` (fixed it). `w` is width and `h` is height. – Bernhard Barker Jan 22 '13 at 20:25
  • Why can't I use a secondary 2d array with the labels instead of the union find structure? –  Jan 23 '13 at 08:10
  • @user1981497 Do you mean use a 2d rather than a 1d array? You can, but then you'll need to convert the value at a cell to coordinates (unless you use a coordinate class, which takes up way more space than an int), and that just seems like it complicates things a little more, and this breaks away from the standard union-find implementation, and my thoughts are that you should only break away from the standard way if there are clear advantages. – Bernhard Barker Jan 23 '13 at 08:28
  • Yeah, I should maybe stick to the union find way. One more question, Iam trying the code now but it doesn't compile. Is there any place where I can post it so you can check it? –  Jan 23 '13 at 08:31
  • let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/23200/discussion-between-user1981497-and-dukeling) –  Jan 23 '13 at 08:41
  • One final question... Can you explain what the union function does? –  Jan 23 '13 at 13:04
  • @user1981497 Think of union-find as starting off as a bunch of size 1 trees and each union combines 2 trees (look [here](http://scientopia.org/blogs/goodmath/2007/09/06/graph-searches-and-disjoint-sets-the-union-find-problem/) for a visualization) – Bernhard Barker Jan 23 '13 at 13:16
  • Yes I get it, but why do we feed union these values : union(x*h + y, x2*h + y2); ? Thanks –  Jan 23 '13 at 13:19
  • @user1981497 That's just a way to convert indices for a 2D array to a indices for a 1D array. – Bernhard Barker Jan 23 '13 at 13:27
  • So you mean that we convert the 2d array cells to 1d array cells, right? That it's kind of a formula... –  Jan 23 '13 at 13:38
  • Yes, it's just a formula for 2D to 1D array cells conversion. – Bernhard Barker Jan 23 '13 at 13:57
  • Sorry if I am so persistent on this question, but I still can't understand why you use the union function. I've watched this entire lecture about disjoint data sets(http://www.youtube.com/watch?v=wSPAjGfDl7Q) but I can't understand what the algorithm does in the 2d matrix. Thanks –  Jan 23 '13 at 16:06
  • Wouldn't the find algorithm be more appropriate(find the origin of all cells)? –  Jan 23 '13 at 16:08
  • OK now I get it. So, this algorithm is supposed to **find** the root of every cell and **unify** all the blobs to one root. Is this correct? Thanks –  Jan 23 '13 at 16:59
  • It turned out that I will need to extract the blobs, so yeah, the union find isn't so useful. I think I am going to use gustav's solution instead... Thanks anyway. –  Jan 23 '13 at 17:11
  • Sorry for reviving such old post, but I've been tinkering with your code today for another use and read the disjoint sets in Into to Algorithms.Why couldn't the while statements be replaced with if's? I'm very curious.... Anyway for this problem, I finally used gustav's solution, but yours was nice too! –  Jan 29 '13 at 20:37
  • If you use an if statement you wouldn't traverse all the way up the tree, you'd only look at the direct parent. Look at a visualization such as [this](http://scientopia.org/blogs/goodmath/2007/09/06/graph-searches-and-disjoint-sets-the-union-find-problem/) to understand better. – Bernhard Barker Jan 29 '13 at 20:52
  • Yeah I understand this, but isnt the while statement supposed to find the root which was previously marked? If so, a = component[a]; wouldn't need multiple iterations.... Thanks –  Jan 29 '13 at 21:02
  • Yeah I got it...you are right, because the value of a changes before each test. –  Jan 29 '13 at 21:06
  • Adding this comment for someone looking for a bit more robust solution given limited computing power. This method has never failed to give results in my 4 year old lenovo laptop. DFS fails for big datasets, but you may have more luck using 64 bit ints. – Vignesh Nov 08 '17 at 14:27
0

You could also try this transitive closure approach, however the triple loop for the transitive closure slows things up when there are many separated objects in the image, suggested code changes welcome

Cheers

Dave

void CC(unsigned char* pBinImage, unsigned char* pOutImage, int width, int height, int     CON8)
{
int i, j, x, y, k, maxIndX, maxIndY,  sum, ct, newLabel=1, count, maxVal=0, sumVal=0, maxEQ=10000;
int *eq=NULL, list[4];
int bAdd;

memcpy(pOutImage, pBinImage, width*height*sizeof(unsigned char));

unsigned char* equivalences=(unsigned char*) calloc(sizeof(unsigned char), maxEQ*maxEQ);

// modify labels this should be done with iterators to modify elements
// current column
for(j=0; j<height; j++)
{
    // current row
    for(i=0; i<width; i++)
    {
        if(pOutImage[i+j*width]>0)
        {
            count=0;

            // go through blocks
            list[0]=0;
            list[1]=0;
            list[2]=0;
            list[3]=0;

            if(j>0)
            {
                if((i>0))
                {
                    if((pOutImage[(i-1)+(j-1)*width]>0) && (CON8 > 0))
                        list[count++]=pOutImage[(i-1)+(j-1)*width];
                }

                if(pOutImage[i+(j-1)*width]>0)
                {
                    for(x=0, bAdd=true; x<count; x++)
                    {
                        if(pOutImage[i+(j-1)*width]==list[x])
                            bAdd=false;
                    }

                    if(bAdd)
                        list[count++]=pOutImage[i+(j-1)*width];
                }

                if(i<width-1)
                {
                    if((pOutImage[(i+1)+(j-1)*width]>0) && (CON8 > 0))
                    {
                        for(x=0, bAdd=true; x<count; x++)
                        {
                            if(pOutImage[(i+1)+(j-1)*width]==list[x])
                                bAdd=false;
                        }

                        if(bAdd)
                            list[count++]=pOutImage[(i+1)+(j-1)*width];
                    }
                }
            }

            if(i>0)
            {
                if(pOutImage[(i-1)+j*width]>0)
                {
                    for(x=0, bAdd=true; x<count; x++)
                    {
                        if(pOutImage[(i-1)+j*width]==list[x])
                            bAdd=false;
                    }

                    if(bAdd)
                        list[count++]=pOutImage[(i-1)+j*width];
                }
            }

            // has a neighbour label
            if(count==0)
                pOutImage[i+j*width]=newLabel++;
            else
            {
                pOutImage[i+j*width]=list[0];

                if(count>1)
                {
                    // store equivalences in table
                    for(x=0; x<count; x++)
                        for(y=0; y<count; y++)
                            equivalences[list[x]+list[y]*maxEQ]=1;
                }

            }
        }
    }
}

 // floyd-Warshall algorithm - transitive closure - slow though :-(
 for(i=0; i<newLabel; i++)
    for(j=0; j<newLabel; j++)
    {
        if(equivalences[i+j*maxEQ]>0)
        {
            for(k=0; k<newLabel; k++)
            {
                equivalences[k+j*maxEQ]= equivalences[k+j*maxEQ] || equivalences[k+i*maxEQ];
            }
        }
    }


eq=(int*) calloc(sizeof(int), newLabel);

for(i=0; i<newLabel; i++)
    for(j=0; j<newLabel; j++)
    {
        if(equivalences[i+j*maxEQ]>0)
        {
            eq[i]=j;
            break;
        }
    }


free(equivalences);

// label image with equivalents
for(i=0; i<width*height; i++)
{
    if(pOutImage[i]>0&&eq[pOutImage[i]]>0)
        pOutImage[i]=eq[pOutImage[i]];
}

free(eq);
}
ejectamenta
  • 1,047
  • 17
  • 20
0

very useful Document => https://docs.google.com/file/d/0B8gQ5d6E54ZDM204VFVxMkNtYjg/edit

java application - open source - extract objects from image - connected componen labeling => https://drive.google.com/file/d/0B8gQ5d6E54ZDTVdsWE1ic2lpaHM/edit?usp=sharing

    import java.util.ArrayList;

public class cclabeling

{

 int neighbourindex;ArrayList<Integer> Temp;

 ArrayList<ArrayList<Integer>> cc=new ArrayList<>();

 public int[][][] cclabel(boolean[] Main,int w){

 /* this method return array of arrays "xycc" each array contains 

 the x,y coordinates of pixels of one connected component 

 – Main => binary array of image 

 – w => width of image */

long start=System.nanoTime();

int len=Main.length;int id=0;

int[] dir={-w-1,-w,-w+1,-1,+1,+w-1,+w,+w+1};

for(int i=0;i<len;i+=1){

if(Main[i]){

Temp=new ArrayList<>();

Temp.add(i);

for(int x=0;x<Temp.size();x+=1){

id=Temp.get(x);

for(int u=0;u<8;u+=1){

neighbourindex=id+dir[u];

 if(Main[neighbourindex]){ 

 Temp.add(neighbourindex);

 Main[neighbourindex]=false;

 }

 }

Main[id]=false;

}

cc.add(Temp);

    }

}

int[][][] xycc=new int[cc.size()][][];

int x;int y;

for(int i=0;i<cc.size();i+=1){

 xycc[i]=new int[cc.get(i).size()][2];



 for(int v=0;v<cc.get(i).size();v+=1){

 y=Math.round(cc.get(i).get(v)/w);

 x=cc.get(i).get(v)-y*w;

 xycc[i][v][0]=x;

 xycc[i][v][1]=y;

 }



}

long end=System.nanoTime();

long time=end-start;

System.out.println("Connected Component Labeling Time =>"+time/1000000+" milliseconds");

System.out.println("Number Of Shapes => "+xycc.length);

 return xycc;



 }

}
ostaz
  • 1
  • 1
0

Please find below the sample code for connected component labeling . The code is written in JAVA

package addressextraction;

public class ConnectedComponentLabelling {

    int[] dx={+1, 0, -1, 0};
    int[] dy={0, +1, 0, -1};
    int row_count=0;
    int col_count=0;
    int[][] m;
    int[][] label;

    public ConnectedComponentLabelling(int row_count,int col_count) {
        this.row_count=row_count;
        this.col_count=col_count;
        m=new int[row_count][col_count];
        label=new int[row_count][col_count];
    }

    void dfs(int x, int y, int current_label) {
          if (x < 0 || x == row_count) return; // out of bounds
          if (y < 0 || y == col_count) return; // out of bounds
          if (label[x][y]!=0 || m[x][y]!=1) return; // already labeled or not marked with 1 in m

          // mark the current cell
          label[x][y] = current_label;
         // System.out.println("****************************");

          // recursively mark the neighbors
          int direction = 0;
          for (direction = 0; direction < 4; ++direction)
            dfs(x + dx[direction], y + dy[direction], current_label);
        }

    void find_components() {
          int component = 0;
          for (int i = 0; i < row_count; ++i) 
            for (int j = 0; j < col_count; ++j) 
              if (label[i][j]==0 && m[i][j]==1) dfs(i, j, ++component);
        }


    public static void main(String[] args) {
        ConnectedComponentLabelling l=new ConnectedComponentLabelling(4,4);
        l.m[0][0]=0;
        l.m[0][1]=0;
        l.m[0][2]=0;
        l.m[0][3]=0;

        l.m[1][0]=0;
        l.m[1][1]=1;
        l.m[1][2]=0;
        l.m[1][3]=0;

        l.m[2][0]=0;
        l.m[2][1]=0;
        l.m[2][2]=0;
        l.m[2][3]=0;

        l.m[3][0]=0;
        l.m[3][1]=1;
        l.m[3][2]=0;
        l.m[3][3]=0;

        l.find_components();

        for (int i = 0; i < 4; i++) {
            for (int j = 0; j < 4; j++) {
                System.out.print(l.label[i][j]);
            }
            System.out.println("");

        }


    }

}
Nithin
  • 9,661
  • 14
  • 44
  • 67