Iterate through 2d array of booleans and leave only the largest contiguous "2D blob of ones"

Question

Ok, so the question is kind of awkwardly phrased, but I hope this will clear things up.

I have this sample 2d array.

$array = array(
array(1, 0, 0, 0, 1, 0, 0, 1),
array(0, 0, 1, 1, 1, 1, 0, 1),
array(0, 1, 1, 0, 1, 0, 0, 0),
array(0, 1, 1, 0, 0, 0, 1, 0),
array(1, 0, 0, 0, 1, 1, 1, 1),
array(0, 1, 1, 0, 1, 0, 1, 0),
array(0, 0, 0, 0, 0, 0, 0, 1)
);

When iterated by rows (and terminating each row with \n), and for every row then iterated by column, it will echo something like this: (░░ = 0, ▓▓ = 1)

    ▓▓░░░░░░▓▓░░░░▓▓
    ░░░░▓▓▓▓▓▓▓▓░░▓▓
    ░░▓▓▓▓░░▓▓░░░░░░
    ░░▓▓▓▓░░░░░░▓▓░░
    ▓▓░░░░░░▓▓▓▓▓▓▓▓
    ░░▓▓▓▓░░▓▓░░▓▓░░
    ░░░░░░░░░░░░░░▓▓

But what I'd like to do is to "analyse" the array and only leave 1 contiguous shape (the one with the most "cells"), in this example, the result would be:

    ░░░░░░░░▓▓░░░░░░
    ░░░░▓▓▓▓▓▓▓▓░░░░
    ░░▓▓▓▓░░▓▓░░░░░░
    ░░▓▓▓▓░░░░░░░░░░
    ▓▓░░░░░░░░░░░░░░
    ░░▓▓▓▓░░░░░░░░░░
    ░░░░░░░░░░░░░░░░

My initial approach was to:

Assign each ▓▓ cell a unique number (be it completely random, or the current iteration number):

01      02    03
    04050607  08
  0910  11      
  1213      14  
15      16171819
  2021  22  23  
              24

Iterate through the array many, MANY times: every iteration, each ▓▓ cell assumes the largest unique number among his neighbours. The loop would go on indefinitely until there's no change detected between the current state and the previous state. After the last iteration, the result would be this:
```
01      21    08
    21212121  08
  2121  21      
  2121      24  
21      24242424
  2121  24  24  
              24
```
Now it all comes down to counting the value that occurs the most. Then, iterating once again, to turn all the cells whose value is not the most popular one, to 0, giving me the desired result.

However, I feel it's quite a roundabout and computationally heavy approach for such a simple task and there has to be a better way. Any ideas would be greatly appreciated, cheers!

BONUS POINTS: Divide all the blobs into an array of 2D arrays, ordered by number of cells, so we can do something with the smallest blob, too

It's a common algorithm to get the connection components of a graph. It takes at most as much iterations as the size of the biggest component. You cannot expect to get much better than that. — MrSmith42, Feb 13 '18 at 15:38
I think you should look at it as a matrix. You could find biggest blob by looking at neighbors of selected point. Each time remembering what points were used. I would use OOP for this. — Universus, Feb 13 '18 at 15:44
Read about https://en.wikipedia.org/wiki/Flood_fill and https://en.wikipedia.org/wiki/Connected-component_labeling — גלעד ברקן, Feb 13 '18 at 16:02

MrSmith42 · Answer 1 · 2018-02-22T09:05:13.837

2

I can only think of a few minor improvements:

Keep a linked list of the not empty fields. In step 2 you do not need to touch n² matrix-elements, you only need to touch the ones in your linked list. Which might be much less depending how sparse your matrix is.
You only need to compare to the right, right-down, left-down and down directions. Otherwise The other directions are already checked from the former row/column. What I mean: When I am greater that my right neighbour, I can already change the number of the right neighbour. (same for down and right-down). This halfs the number of compairs.

edited Feb 22 '18 at 09:05

answered Feb 13 '18 at 15:45

MrSmith42

9,961
6
38
49

No. 2 isn't correct. It's not that clear from the example, but OP also wants a diagonal connection. So if top-right connects, it is a connection, even if top, and right don't connect. – Michel Feb 21 '18 at 13:44
@Michel: You were right. I added the missing *down-left* case to my answer. – MrSmith42 Feb 22 '18 at 09:06

score 2 · Answer 2 · answered Feb 14 '18 at 12:04

Always fun, these problems. And done before, so I'll dump my code here, maybe you can use some of it. This basically follows every shape by looking at a cell and its surrounding 8 cells, and if they connect go to the connecting cell, look again and so on...

<?php 
$shape_nr=1;
$ln_max=count($array);
$cl_max=count($array[0]);
$done=[];

//LOOP ALL CELLS, GIVE 1's unique number
for($ln=0;$ln<$ln_max;++$ln){
for($cl=0;$cl<$cl_max;++$cl){
    if($array[$ln][$cl]===0)continue;
    $array[$ln][$cl] = ++$shape_nr;
}}

//DETECT SHAPES
for($ln=0;$ln<$ln_max;++$ln){
for($cl=0;$cl<$cl_max;++$cl){
    if($array[$ln][$cl]===0)continue;

    $shape_nr=$array[$ln][$cl];
    if(in_array($shape_nr,$done))continue;

    look_around($ln,$cl,$ln_max,$cl_max,$shape_nr,$array);
    //SET SHAPE_NR to DONE, no need to look at that number again
    $done[]=$shape_nr;
}}  

//LOOP THE ARRAY and COUNT SHAPENUMBERS
$res=array();
for($ln=0;$ln<$ln_max;++$ln){
for($cl=0;$cl<$cl_max;++$cl){
    if($array[$ln][$cl]===0)continue;
    if(!isset($res[$array[$ln][$cl]]))$res[$array[$ln][$cl]]=1;
    else $res[$array[$ln][$cl]]++;
}}

//get largest shape
$max = max($res);
$shape_value_max = array_search ($max, $res);

//get smallest shape
$min = min($res);
$shape_value_min = array_search ($min, $res);

// recursive function: detect connecting cells  
function look_around($ln,$cl,$ln_max,$cl_max,$nr,&$array){
    //create mini array
    $mini=mini($ln,$cl,$ln_max,$cl_max);
    if($mini===false)return false;

    //loop surrounding cells
    foreach($mini as $v){
        if($array[$v[0]][$v[1]]===0){continue;}
        if($array[$v[0]][$v[1]]!==$nr){
            // set shape_nr of connecting cell
            $array[$v[0]][$v[1]]=$nr;

            // follow the shape
            look_around($v[0],$v[1],$ln_max,$cl_max,$nr,$array);
            }
        }
    return $nr;
    }

// CREATE ARRAY WITH THE 9 SURROUNDING CELLS
function mini($ln,$cl,$ln_max,$cl_max){
    $look=[];   
    $mini=[[-1,-1],[-1,0],[-1,1],[0,-1],[0,1],[1,-1],[1,0],[1,1]];
    foreach($mini as $v){
        if( $ln + $v[0] >= 0 &&
            $ln + $v[0] < $ln_max &&
            $cl + $v[1] >= 0 &&
            $cl + $v[1] < $cl_max
            ){
            $look[]=[$ln + $v[0], $cl + $v[1]];
            }
        }

    if(count($look)===0){return false;}
    return $look;
    }

Here's a fiddle

score 1 · Answer 3 · answered Feb 13 '18 at 15:45

1

If your array size isn't huge and memory won't be a problem maybe a recursive solution would be faster. I found a c++ algorithm that does this here: https://www.geeksforgeeks.org/find-length-largest-region-boolean-matrix/

answered Feb 13 '18 at 15:45

Mark D

233
3
17

Iterate through 2d array of booleans and leave only the largest contiguous "2D blob of ones"

3 Answers3