3

I have following 2D array

regions = array([[3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4],
                 [3, 3, 3, 3, 8, 8, 8, 8, 8, 4, 4, 4, 4],
                 [3, 3, 3, 3, 8, 8, 8, 8, 8, 4, 4, 4, 4],
                 [3, 3, 3, 3, 8, 8, 8, 8, 8, 4, 4, 4, 4],
                 [3, 6, 6, 6, 8, 8, 8, 8, 8, 7, 7, 7, 4],
                 [3, 6, 6, 6, 8, 8, 8, 8, 8, 7, 7, 7, 4],
                 [3, 6, 6, 6, 6, 8, 8, 8, 7, 7, 7, 7, 4],
                 [3, 6, 6, 6, 6, 2, 2, 2, 7, 7, 7, 7, 4],
                 [5, 6, 6, 6, 6, 2, 2, 2, 7, 7, 7, 7, 1],
                 [5, 6, 6, 6, 6, 2, 2, 2, 7, 7, 7, 7, 1],
                 [5, 6, 6, 6, 6, 2, 2, 2, 7, 7, 7, 7, 1],
                 [5, 5, 5, 2, 2, 2, 2, 2, 2, 2, 1, 1, 1],
                 [5, 5, 5, 2, 2, 2, 2, 2, 2, 2, 1, 1, 1]])

I want to find neighbouring numbers for all individual numbers. For example 3 is a negibour of 4,5,6,8. Currently i am doing this excercise using a for loop by following below mentioned code.

numbers = scipy.unique(regions)
for i in numbers:
    index = i-1
    slices = scipy.ndimage.find_objects(regions)
    sub_im = regions[slices[index]]
    im = sub_im == i
    neighbors = scipy.ndimage.binary_dilation(input=im, structure=disk(1))
    neighbors = neighbors*sub_im
    neighbors_list = scipy.unique(neighbors)[1:] - 1
    print (neighbors_list)

The problem is I dont want to use for loop as my regions array is in the order of millions. Is there any fast way to solve this problem without for loop?

ZAK
  • 105
  • 6
  • You could iterate over 2x2 windows; if there is a 3 in the window *save* all the numbers in the window that are not 3. Search for numpy sliding/moving windows for an efficient *window* iterator/generator/stride trick. It is worth a try for comparison. – wwii Jan 04 '18 at 22:50
  • Thanks for your response @wwii. Do you have any simple link which explains this method? – ZAK Jan 04 '18 at 23:04

2 Answers2

1

You can accomplish this with numpy, using np.roll(), np.where(), and np.unique(). The idea is to append to each entry its four neighbors, and then to pull out the unique members of those lists of five (the entry and its four neighbors). Here is an implementation that should clarify:

# make a 3d array with the matrix entry and its four neighbors
neighbor_array = np.array([regions,
                          np.roll(regions,+1,axis=0),
                          np.roll(regions,-1,axis=0),
                          np.roll(regions,+1,axis=1),
                          np.roll(regions,-1,axis=1),
                          ])
# if you want neighbors to include wraparounds, use above; if not, prune
neighbor_array_pruned = neighbor_array[:,1:-1,1:-1]
# reshape to a 2d array entries x neighbors
neighbor_list = np.reshape(neighbor_array_pruned,[5,-1]).T
# get uniques into a dictionary 
neighbor_dict = {}
for num in np.unique(regions):
    neighbor_dict[num] = np.unique(neighbor_list[np.where(neighbor_list[:,0]==num)])

This produces neighbor_dict:

{1: array([1, 2, 7]),
 2: array([1, 2, 5, 6, 7, 8]),
 3: array([3, 6, 8]),
 4: array([4, 7, 8]),
 5: array([2, 5, 6]),
 6: array([2, 3, 5, 6, 8]),
 7: array([1, 2, 4, 7, 8]),
 8: array([2, 3, 4, 6, 7, 8])}

Note I pruned the edges; if you want to include wraparound neighbors or do something more nuanced, you can elaborate that pruning line.

muskrat
  • 1,519
  • 11
  • 18
  • Thanks, @muskrat. How can I formulate this code when `regions` are 3D array? – ZAK Jan 05 '18 at 02:13
  • Use `roll` on `axis=2` to get the neighbors in that third dimension as well; your `neighbor_list` will have 7 elements (instead of 5 currently), and you will have 6 `np.roll` statements, instead of 4. Please accept answer if you found it useful; thanks. – muskrat Jan 05 '18 at 02:58
  • Thanks for your response. The code works fine but the `for loop` in last two lines is slowing down the code. Can it be replaced by any array operation? – ZAK Jan 05 '18 at 17:08
  • I have tried to remove `for loop` but currently I am unsuccessful. – ZAK Jan 05 '18 at 20:03
  • It's not obvious to me how to get rid of that last for loop; that should be dramatically better performance than looping over the millions of entries in the regions array itself. One thing you can try to improve performance is to run `np.unique` on `neighbor_list`, that is: `neighbor_list = np.unique(np.reshape(neighbor_array_pruned,[5,-1]).T,axis=0)` Note that you need a current enough version of numpy to support the `axis` argument so that the unique operation will happen rowwise. – muskrat Jan 05 '18 at 20:21
  • Your final result `neighbor_dict` is showing incomplete values. For example, first entry is `1: array([1, 2, 7])` while it should be `1: array([1, 2, 7, 4])` as `4` is also connected with `1` in `regions` array. Is there anything missing? @muskrat – ZAK Jan 06 '18 at 00:48
  • As discussed in code comments the pruning step ignores the edges; comment that out to include wraparounds. If you don’t want wraparounds, pad the edges with NaN or 0 or similar and then ignore that after the fact. Please upvote or accept answer. – muskrat Jan 06 '18 at 02:32
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/162642/discussion-between-zak-and-muskrat). – ZAK Jan 06 '18 at 03:32
1

I would suggest to use something like this approach, which is linear in the number of the elements of the matrix (consider that you cannot pay less than scanning at least once all the matrix's elements). Essentially, I group into a set the list of adjacent numbers, then on top of this I compute the neighbors entries.

import numpy as np
import collections

def adj_to_neighbor_dict(adj):
    assert hasattr(adj, "__iter__")

    neighbor_dict = collections.defaultdict(lambda: set())
    for i,j in adj:
        if i == j:
            continue
        neighbor_dict[i].add(j)
        neighbor_dict[j].add(i)
    return neighbor_dict

def get_neighbors_2d(npmatrix):
    assert len(npmatrix.shape) == 2
    I, J = range(npmatrix.shape[0]-1), range(npmatrix.shape[1]-1)
    adj_set = set(
        (npmatrix[i,j], npmatrix[i+1,j])
        for i in I
        for j in J
    ) | set(
        (npmatrix[i,j], npmatrix[i,j+1])
        for i in I
        for j in J
    )
    return adj_to_neighbor_dict(adj_set)

I tested it on a random matrix of 1M elements with 10 different numbers (np.random.randint(0,10,(1000,1000))) and it took 1.61 seconds.


UPDATE:

The same approach can be used for 3D arrays. The code to do it is the following:

def get_neighbors_3d(npmatrix):
    assert len(npmatrix.shape) == 3
    I, J, K = range(npmatrix.shape[0]-1), range(npmatrix.shape[1]-1), range(npmatrix.shape[2]-1)
    adj_set = set(
        (npmatrix[i,j,k], npmatrix[i+1,j,k])
        for i in I
        for j in J
        for k in K
    ) | set(
        (npmatrix[i,j,k], npmatrix[i,j+1,k])
        for i in I
        for j in J
        for k in K
    ) | set(
        (npmatrix[i,j,k], npmatrix[i,j,k+1])
        for i in I
        for j in J
        for k in K
    )
    return adj_to_neighbor_dict(adj_set)

I tested also this function on a random matrix of 1M elements with 10 different numbers (np.random.randint(0,10,(100,100,100))) and it took 2.60 seconds.


I would propose also a general purpose solution which is not based on the shape of the np.array:

def npmatrix_shape_iter(shape):
    num_dimensions = len(shape)
    last_dimension = num_dimensions-1
    coord = [0] * num_dimensions
    while True:
        yield tuple(coord)
        coord[last_dimension] += 1
        for i in xrange(last_dimension, 0, -1):
            if coord[i] < shape[i]:
                break
            coord[i] = 0
            coord[i-1] += 1
        # end condition: all the dimensions have been explored
        if coord[0] >= shape[0]:
            break

def adj_position_iter(tpl):
    new_tpl = list(tpl)
    for i in xrange(len(tpl)):
        new_tpl[i] += 1
        yield tuple(new_tpl)
        new_tpl[i] -= 1

def get_neighbors(npmatrix):
    neighbors = set(
        (npmatrix[tpl], npmatrix[adj_tpl])
        for tpl in npmatrix_shape_iter(tuple(np.array(npmatrix.shape)-1))
        for adj_tpl in adj_position_iter(tpl)
    )
    neighbor_dict = collections.defaultdict(lambda: [])
    for i,j in neighbors:
        if i == j:
            continue
        neighbor_dict[i].append(j)
        neighbor_dict[j].append(i)
    return neighbor_dict

Since this function is general it need more work to do, indeed it's slower than the previous ones. On the same 2D matrix of the first test it requires 6.71 s, while on the 3D matrix of the second test it requires 7.96 s.


UPDATE 2:

I updated the code for 2D and 3D matrices with a faster (and I hope also easier) version. Without other constraints about the location of the number inside the matrix there is no way to discover all the colors without scanning all the matrix cells: with the for loops we are currently doing this. Every function you can use to accomplish the task will internally scan the whole matrix (at least). By the way I am not saying this is the fastest solution, because one alternative solution can be to use cython or native numpy code, if it exists.

Roberto Trani
  • 1,217
  • 11
  • 14
  • Thanks for the response @Roberto Trani. Can I use this approach for a 3D array? Its working fine for the 2D array. – ZAK Jan 05 '18 at 02:10
  • Thanks, @Roberto Trani. I have tried your updated code for the 3D array. It's working fine. I was wondering can we replace `tpl` and `adj_tpl` `for loops` in arrays by any chance as these two `for loops` increase the code timing a lot when I try (300,300,100) matrix. The average time is 25s. – ZAK Jan 05 '18 at 17:05