4

Ok, so the question is kind of awkwardly phrased, but I hope this will clear things up.

I have this sample 2d array.

$array = array(
array(1, 0, 0, 0, 1, 0, 0, 1),
array(0, 0, 1, 1, 1, 1, 0, 1),
array(0, 1, 1, 0, 1, 0, 0, 0),
array(0, 1, 1, 0, 0, 0, 1, 0),
array(1, 0, 0, 0, 1, 1, 1, 1),
array(0, 1, 1, 0, 1, 0, 1, 0),
array(0, 0, 0, 0, 0, 0, 0, 1)
);

When iterated by rows (and terminating each row with \n), and for every row then iterated by column, it will echo something like this: (░░ = 0, ▓▓ = 1)

    ▓▓░░░░░░▓▓░░░░▓▓
    ░░░░▓▓▓▓▓▓▓▓░░▓▓
    ░░▓▓▓▓░░▓▓░░░░░░
    ░░▓▓▓▓░░░░░░▓▓░░
    ▓▓░░░░░░▓▓▓▓▓▓▓▓
    ░░▓▓▓▓░░▓▓░░▓▓░░
    ░░░░░░░░░░░░░░▓▓

But what I'd like to do is to "analyse" the array and only leave 1 contiguous shape (the one with the most "cells"), in this example, the result would be:

    ░░░░░░░░▓▓░░░░░░
    ░░░░▓▓▓▓▓▓▓▓░░░░
    ░░▓▓▓▓░░▓▓░░░░░░
    ░░▓▓▓▓░░░░░░░░░░
    ▓▓░░░░░░░░░░░░░░
    ░░▓▓▓▓░░░░░░░░░░
    ░░░░░░░░░░░░░░░░

My initial approach was to:

  1. Assign each ▓▓ cell a unique number (be it completely random, or the current iteration number):

    01      02    03
        04050607  08
      0910  11      
      1213      14  
    15      16171819
      2021  22  23  
                  24
    
  2. Iterate through the array many, MANY times: every iteration, each ▓▓ cell assumes the largest unique number among his neighbours. The loop would go on indefinitely until there's no change detected between the current state and the previous state. After the last iteration, the result would be this:

    01      21    08
        21212121  08
      2121  21      
      2121      24  
    21      24242424
      2121  24  24  
                  24
    

    Now it all comes down to counting the value that occurs the most. Then, iterating once again, to turn all the cells whose value is not the most popular one, to 0, giving me the desired result.

However, I feel it's quite a roundabout and computationally heavy approach for such a simple task and there has to be a better way. Any ideas would be greatly appreciated, cheers!

BONUS POINTS: Divide all the blobs into an array of 2D arrays, ordered by number of cells, so we can do something with the smallest blob, too

Nikedemos
  • 41
  • 2
  • It's a common algorithm to get the connection components of a graph. It takes at most as much iterations as the size of the biggest component. You cannot expect to get much better than that. – MrSmith42 Feb 13 '18 at 15:38
  • I think you should look at it as a matrix. You could find biggest blob by looking at neighbors of selected point. Each time remembering what points were used. I would use OOP for this. – Universus Feb 13 '18 at 15:44
  • Read about https://en.wikipedia.org/wiki/Flood_fill and https://en.wikipedia.org/wiki/Connected-component_labeling – גלעד ברקן Feb 13 '18 at 16:02

3 Answers3

2

I can only think of a few minor improvements:

  1. Keep a linked list of the not empty fields. In step 2 you do not need to touch n² matrix-elements, you only need to touch the ones in your linked list. Which might be much less depending how sparse your matrix is.

  2. You only need to compare to the right, right-down, left-down and down directions. Otherwise The other directions are already checked from the former row/column. What I mean: When I am greater that my right neighbour, I can already change the number of the right neighbour. (same for down and right-down). This halfs the number of compairs.

MrSmith42
  • 9,961
  • 6
  • 38
  • 49
  • No. 2 isn't correct. It's not that clear from the example, but OP also wants a diagonal connection. So if top-right connects, it is a connection, even if top, and right don't connect. – Michel Feb 21 '18 at 13:44
  • @Michel: You were right. I added the missing *down-left* case to my answer. – MrSmith42 Feb 22 '18 at 09:06
2

Always fun, these problems. And done before, so I'll dump my code here, maybe you can use some of it. This basically follows every shape by looking at a cell and its surrounding 8 cells, and if they connect go to the connecting cell, look again and so on...

<?php 
$shape_nr=1;
$ln_max=count($array);
$cl_max=count($array[0]);
$done=[];

//LOOP ALL CELLS, GIVE 1's unique number
for($ln=0;$ln<$ln_max;++$ln){
for($cl=0;$cl<$cl_max;++$cl){
    if($array[$ln][$cl]===0)continue;
    $array[$ln][$cl] = ++$shape_nr;
}}

//DETECT SHAPES
for($ln=0;$ln<$ln_max;++$ln){
for($cl=0;$cl<$cl_max;++$cl){
    if($array[$ln][$cl]===0)continue;

    $shape_nr=$array[$ln][$cl];
    if(in_array($shape_nr,$done))continue;

    look_around($ln,$cl,$ln_max,$cl_max,$shape_nr,$array);
    //SET SHAPE_NR to DONE, no need to look at that number again
    $done[]=$shape_nr;
}}  

//LOOP THE ARRAY and COUNT SHAPENUMBERS
$res=array();
for($ln=0;$ln<$ln_max;++$ln){
for($cl=0;$cl<$cl_max;++$cl){
    if($array[$ln][$cl]===0)continue;
    if(!isset($res[$array[$ln][$cl]]))$res[$array[$ln][$cl]]=1;
    else $res[$array[$ln][$cl]]++;
}}

//get largest shape
$max = max($res);
$shape_value_max = array_search ($max, $res);

//get smallest shape
$min = min($res);
$shape_value_min = array_search ($min, $res);

// recursive function: detect connecting cells  
function look_around($ln,$cl,$ln_max,$cl_max,$nr,&$array){
    //create mini array
    $mini=mini($ln,$cl,$ln_max,$cl_max);
    if($mini===false)return false;

    //loop surrounding cells
    foreach($mini as $v){
        if($array[$v[0]][$v[1]]===0){continue;}
        if($array[$v[0]][$v[1]]!==$nr){
            // set shape_nr of connecting cell
            $array[$v[0]][$v[1]]=$nr;

            // follow the shape
            look_around($v[0],$v[1],$ln_max,$cl_max,$nr,$array);
            }
        }
    return $nr;
    }

// CREATE ARRAY WITH THE 9 SURROUNDING CELLS
function mini($ln,$cl,$ln_max,$cl_max){
    $look=[];   
    $mini=[[-1,-1],[-1,0],[-1,1],[0,-1],[0,1],[1,-1],[1,0],[1,1]];
    foreach($mini as $v){
        if( $ln + $v[0] >= 0 &&
            $ln + $v[0] < $ln_max &&
            $cl + $v[1] >= 0 &&
            $cl + $v[1] < $cl_max
            ){
            $look[]=[$ln + $v[0], $cl + $v[1]];
            }
        }

    if(count($look)===0){return false;}
    return $look;
    }

Here's a fiddle

Michel
  • 4,076
  • 4
  • 34
  • 52
1

If your array size isn't huge and memory won't be a problem maybe a recursive solution would be faster. I found a c++ algorithm that does this here: https://www.geeksforgeeks.org/find-length-largest-region-boolean-matrix/

Mark D
  • 233
  • 3
  • 17