3

I have a 3D array, for example:

array = np.array([[[ 1.,  1.,  0.],
                   [ 0.,  0.,  0.],
                   [ 0.,  0.,  0.]],
                  [[1.,  0.,  0.],
                   [ 0.,  0.,  0.],
                   [ 0.,  0.,  0.]],
                  [[ 1.,  0.,  0.],
                   [ 0.,  0.,  0.],
                   [ 0.,  0.,  0.]]])

I need to find the runs of conected ones for each possible size for different directions. For example, in the direction (1,0,0), horitzontal direction, the output should be:

> **Runs of size 1: 2
> **Runs of size 2: 1
> **Runs of size 3: 0

On the direction (0, 1, 0), accross the vertical axis:

> **Runs of size 1: 4
> **Runs of size 2: 0
> **Runs of size 3: 0

And in the direction (0, 0, 1), accross the z axis:

> **Runs of size 1: 1
> **Runs of size 2: 0
> **Runs of size 3: 1

Is there an effective way to implement this?

EDIT:

I attach the solution I am working on:

dire = np.array(([1, 0, 0], [0, 1, 0], [0, 0, 1])) # Possible directions
results = np.zeros((array.shape[0], len(dire)))  # Maximum runs will be 3

# Find all indices
indices = np.argwhere(array == 1) 

# Loop over each direction                 
for idire, dire in enumerate(dire):
    results[0, idire] = np.count_nonzero(array) # Count all 1 (conection 0)
    indices_to_compare = indices    

# Now we compare the list of indices with a displaced list of indices
    for irun in range(1, array.shape[0]):   
        indices_displaced = indices + dire*irun             
        aset = set([tuple(x) for x in indices_displaced])
        bset = set([tuple(x) for x in indices_to_compare])
        indices_to_compare = (np.array([x for x in aset & bset]))
        results[irun, idire] = len(indices_to_compare)

    # Rest the counts off bigger runs to the smaller ones, for duplicates
    for indx in np.arange(array.shape[0]-2,-1,-1):
        results[indx, idire] -=            
        np.sum(results[np.arange(array.shape[0]-1,indx,-1), idire])


print(results) # Each column is a direction, each row is the number of bins.

> [[ 2.  4.  3.]
   [ 1.  0.  1.]
   [ 1.  0.  0.]]

So far this code does not work, only for the direction (0,1,0), which does not have any connection and it is trivial. With this notation, the expected output should be:

> [[ 2.  4.  1.]
   [ 1.  0.  0.]
   [ 0.  0.  1.]]

The comparision of the arrays is from this link.

EDIT 2:

I had a mistake in the interpretation of the coordinates, it seems that addint to the results of np.argwhere, (1,0,0) is over the z axist, so the expected result should be:

> [[ 1.  4.  2.]
   [ 0.  0.  1.]
   [ 1.  0.  0.]]
Greynes
  • 719
  • 2
  • 7
  • 15
  • array.tranpose() may be useful . That said, you should show any attempts you have so far. – perigon Jul 20 '17 at 09:24
  • Are those max. runs, because each element along the dims that are not reduced would have some run value? Shows us your best effort, maybe a loopy code to help us get some sense of the problem? – Divakar Jul 20 '17 at 09:31
  • I am really stuck at this problem. I tried to start with loops, but I am able to find if there is a 1 conected, but not the size of the conection. So far my code is not worth to post. – Greynes Jul 20 '17 at 09:59
  • What if the direction is `(-1,0,1)`? Do you need to support non-axial directions? If not, a simpler api would be to pass `axis=n` – Eric Jul 20 '17 at 11:06
  • Yes, in theory all directions are possible – Greynes Jul 20 '17 at 11:08
  • I imagine there's a solution using `cumsum` to count them, some clever boolean masking, and `bincoint` to group them - but that'll only work for axial directions. – Eric Jul 20 '17 at 11:11
  • I think its possible to turn a non-axial problem into an axial one by padding the array with enough zeros, and then using `as_strided – Eric Jul 20 '17 at 11:14
  • I added the approach I was working, I do know if it makes a lot of sense. – Greynes Jul 20 '17 at 12:07

3 Answers3

1

Here's a vectorized solution that works along an axial direction:

def run_lengths(arr, axis):
    # convert to boolean, for speed - no need to store 0 and 1 as floats
    arr = arr.astype(bool)

    # move the axis in question to the start, to make the slicing below easy
    arr = np.moveaxis(arr, axis, 0)

    # find positions at the start and end of a run
    first = np.empty(arr.shape, dtype=bool)
    first[:1] = arr[:1]
    first[1:] = arr[1:] & ~arr[:-1]
    last = np.zeros(arr.shape, dtype=bool)
    last[-1:] = arr[-1]
    last[:-1] = arr[:-1] & ~arr[1:]

    # count the number in each run
    c = np.cumsum(arr, axis=0)
    lengths = c[last] - c[first]

    # group the runs by length. Note the above gives 0 for a run of length 1,
    # so result[i] is the number of runs of length (i + 1)
    return np.bincount(lengths, minlength=len(arr))

for i in range(3):
    print(run_lengths(array, axis=i))
[1 0 1]
[4 0 0]
[2 1 0]
Eric
  • 95,302
  • 53
  • 242
  • 374
1

Here is a solution that works in any direction and combination of directions. Essentially, we use the array to create a graph in which True elements correspond to nodes and edges are created if there is a node in any of the allowed directions. We then compute the connected components and their sizes. Uses networkx to find the connected components in the graph, which is easily installed via pip.

 import numpy as np
 import networkx

 def array_to_graph(bool_array, allowed_steps):
     """
     Arguments:
     ----------
     bool_array    -- boolean array
     allowed_steps -- list of allowed steps; e.g. given a 2D boolean array,
                      [(0, 1), (1, 1)] signifies that from coming from element
                      (i, j) only elements (i, j+1) and (i+1, j+1) are reachable

     Returns:
     --------
     g               -- networkx.DiGraph() instance
     position_to_idx -- dict mapping (i, j) position to node idx
     idx_to_position -- dict mapping node idx to (i, j) position
     """

     # ensure that bool_array is boolean
     assert bool_array.dtype == np.bool, "Input array has to be boolean!"

     # map array indices to node indices and vice versa
     node_idx = range(np.sum(bool_array))
     node_positions = zip(*np.where(bool_array))
     position_to_idx = dict(zip(node_positions, node_idx))

     # create graph
     g = networkx.DiGraph()
     for source in node_positions:
         for step in allowed_steps: # try to step in all allowed directions
             target = tuple(np.array(source) + np.array(step))
             if target in position_to_idx:
                 g.add_edge(position_to_idx[source], position_to_idx[target])

     idx_to_position = dict(zip(node_idx, node_positions))

     return g, idx_to_position, position_to_idx

 def get_connected_component_sizes(g):
     component_generator = networkx.connected_components(g)
     component_sizes = [len(component) for component in component_generator]
     return component_sizes

 def test():
     array = np.array([[[ 1.,  1.,  0.],
                        [ 0.,  0.,  0.],
                        [ 0.,  0.,  0.]],
                       [[1.,  0.,  0.],
                        [ 0.,  0.,  0.],
                        [ 0.,  0.,  0.]],
                       [[ 1.,  0.,  0.],
                        [ 0.,  0.,  0.],
                        [ 0.,  0.,  0.]]], dtype=np.bool)

     directions = [(1,0,0), (0,1,0), (0,0,1)]

     for direction in directions:
         graph, _, _ = array_to_graph(array, [direction])
         sizes = get_connected_component_sizes(graph.to_undirected())
         counts = np.bincount(sizes)

         print direction
         for ii, c in enumerate(counts):
             if ii > 1:
                 print "Runs of size {}: {}".format(ii, c)

     return

NOTE: "Runs" of size 1 are not counted (correctly) but I presume you are not that interested in them anyway. Otherwise, you can easily compute them ex-post by computing the total number of True elements in the particular direction and subtracting number of runs times run length.

Paul Brodersen
  • 11,221
  • 21
  • 38
  • `zip(*np.where(...))` is better spelt `argwhere`, and then you can change `map(operator.add, source, step)` to just `source + np.array(step)` – Eric Jul 20 '17 at 13:14
  • I need to do another dictionary lookup though: `g.add_edge(position_to_idx[source], position_to_idx[target])` and arrays are not hashable.... (although I agree in principle that the line is question is ugly). – Paul Brodersen Jul 20 '17 at 13:16
  • Right, you still need the tuple conversion, but you can at least ditch the `map` – Eric Jul 20 '17 at 13:24
  • It does not work in my computer. The variable sizes (and counts) is empty. – Greynes Jul 21 '17 at 07:57
  • Weird. Can you add the following line after the graph is created: `print list(graph.nodes()), list(graph.edges())` and tell me the output? Which python version are you running? – Paul Brodersen Jul 21 '17 at 10:09
0

I add a variation of the code I was working in that now it seems it works:

import numpy as np


array = np.array([[[ 1.,  1.,  0.],
                   [ 0.,  0.,  0.],
                   [ 0.,  0.,  0.]],
                  [[1.,  0.,  0.],
                   [ 0.,  0.,  0.],
                   [ 0.,  0.,  0.]],
                  [[ 1.,  0.,  0.],
                   [ 0.,  0.,  0.],
                   [ 0.,  0.,  0.]]])

dire = np.array(([1, 0, 0], [0, 1, 0], [0, 0, 1])) # Possible directions
results = np.zeros((array.shape[0], len(dire)))  # Maximum runs will be 3

# Find all indices
indices = np.argwhere(array == 1) 

# Loop over each direction                 
for idire, dire in enumerate(dire):
    results[0, idire] = len(indices) # Count all 1 (conection 0)
    indices_to_compare = indices 

    for irun in range(1, array.shape[0]):
        print('dir:',dire, 'run', irun)
        indices_displaced = indices + dire*irun             
        aset = set([tuple(x) for x in indices_displaced])
        bset = set([tuple(x) for x in indices_to_compare])
        indices_to_compare = (np.array([x for x in aset & bset]))
        results[irun, idire] = len(indices_to_compare)


    for iindx in np.arange(array.shape[0]-2,-1,-1):
        for jindx in np.arange(iindx+1, array.shape[0]):
            print(iindx,jindx,  results[jindx, idire])
            results[iindx, idire] -=  results[jindx, idire] *(jindx-iindx+1)         


print('\n',results)
Greynes
  • 719
  • 2
  • 7
  • 15