6

I have 4 2D numpy arrays, called a, b, c, d, each of them made of n rows and m columns. What I need to do is giving to each element of b and d a value calculated as follows (pseudo-code):

min_coords = min_of_neighbors_coords(x, y)
b[x,y] = a[x,y] * a[min_coords];
d[x,y] = c[min_coords];

Where min_of_neighbors_coords is a function that, given the coordinates of an element of the array, returns the coordinates of the 'neighbor' element that has the lower value. I.e., considering the array:

1, 2, 5
3, 7, 2
2, 3, 6

min_of_neighbors_coords(1, 1) will refer to the central element with the value of 7, and will return the tuple (0, 0): the coordinates of the number 1.

I managed to do this using for loops (element per element), but the algorithm is VERY slow and I'm searching a way to improve it, avoiding loops and demanding the calculations to numpy.

Is it possible?

askewchan
  • 45,161
  • 17
  • 118
  • 134
diegogb
  • 437
  • 1
  • 3
  • 10
  • 1
    in the example you provided what will min_of_neighbors_coords(0, 0) return? – Ionut Hulub Apr 03 '13 at 22:26
  • 1
    What do you want to happen along the edges? – Jaime Apr 03 '13 at 22:30
  • @IonutHulub: will return (0, 1) (the coords of the number 2). – diegogb Apr 03 '13 at 22:31
  • 1
    I would think (0,1) based on his description. Also I think it would be helpful to post the algorithm you have now so we know what we're trying to improve on. – James Porter Apr 03 '13 at 22:32
  • @Jaime: on the edges, the same calculation must be performed, see the answer to IonutHulub. – diegogb Apr 03 '13 at 22:34
  • Then you should rewrite that function as `def min_of_neighbors_coords(x, y): if x == 0 and y == 0: return (0,1) elif x == 0: return (0, y-1) elif y == 0: return (x-1, 0) else: return (x-1, y-1)` – Ionut Hulub Apr 03 '13 at 22:35
  • @JamesPorter: the question simplifies the problem, because I'm working on rasters and the source code is made to read and write .tif files without loading them all in memory. This work is for an open source plugin for Quantum GIS. See http://plugins.qgis.org/plugins/UnibasCostSurface/. Actually, the plugin works fine but is too slow. – diegogb Apr 03 '13 at 22:38
  • @IonutHulub: given the coordinates of an element, min_of_neighbors_coords returns the coordinates of an element that is near the one I'm considering and that contain a value that is lower than other's neighbors values. So, considering the number 7 in my example, it will return the coordinates of the number 1; considering the number 6, it will return the coordinates of the number 2 (the '2' above the number 6), etc. The real arrays are very large, and I can do this working element per element: for n1 in range(n): for m1 in range(m): .... but this approach is too slow :-( – diegogb Apr 03 '13 at 22:45
  • @diegogb What's the size of your tiffs (rows vs columns) and how much time does it take to process with what you have now? – heltonbiker Apr 03 '13 at 22:52
  • Another thing: what exactly is supposed to happen with `d[x,y] = c[min_coords]`? If I understand correctly, `c[min_chords]` will return a tuple; ndarray cells have to contain numbers. – James Porter Apr 03 '13 at 22:58
  • @heltonbiker The size of the source image cannot be known a priori (but my code must work with very large files, so I can't assume that I can load the entire image in memory). For an image of 150x150 pixels (very small), it can take up to 5 minutes (!!!!), but consider that the analysis I'm performing requires to calculate each value many and many times (there can be hundreds or thousands of iterations). – diegogb Apr 03 '13 at 23:00
  • 1
    @James `c[min_coords]` returns the value of array `c` at location `min_coords` – askewchan Apr 03 '13 at 23:01
  • @JamesPorter: askewchan already answered! Sorry if the original question is not very clear! – diegogb Apr 03 '13 at 23:03
  • If a = `[[1, 2, 5],[3, 0, 2],[2, 3, 6]]`. Should we get the `0` in the middle `(1,1)` or the `1` on the corner `(0,0)`? – askewchan Apr 03 '13 at 23:11
  • Instead of `c[min_coords]`, aren't you meaning something like `c.min_coords(x,y)`, or `min_coords(c, [x,y])`? – heltonbiker Apr 03 '13 at 23:21

3 Answers3

5

EDIT I have kept my original answer at the bottom. As Paul points out in the comments, the original answer didn't really answer the OP's question, and could be more easily achieved with an ndimage filter. The following much more cumbersome function should do the right thing. It takes two arrays, a and c, and returns the windowed minimum of a and the values in c at the positions of the windowed minimums in a:

def neighbor_min(a, c):
    ac = np.concatenate((a[None], c[None]))
    rows, cols = ac.shape[1:]
    ret = np.empty_like(ac)

    # Fill in the center
    win_ac = as_strided(ac, shape=(2, rows-2, cols, 3),
                        strides=ac.strides+ac.strides[1:2])
    win_ac = win_ac[np.ogrid[:2, :rows-2, :cols] +
                    [np.argmin(win_ac[0], axis=2)]]
    win_ac = as_strided(win_ac, shape=(2, rows-2, cols-2, 3),
                        strides=win_ac.strides+win_ac.strides[2:3])
    ret[:, 1:-1, 1:-1] =  win_ac[np.ogrid[:2, :rows-2, :cols-2] +
                                 [np.argmin(win_ac[0], axis=2)]]

    # Fill the top, bottom, left and right borders
    win_ac = as_strided(ac[:, :2, :], shape=(2, 2, cols-2, 3),
                        strides=ac.strides+ac.strides[2:3])
    win_ac = win_ac[np.ogrid[:2, :2, :cols-2] +
                    [np.argmin(win_ac[0], axis=2)]]
    ret[:, 0, 1:-1] = win_ac[:, np.argmin(win_ac[0], axis=0),
                             np.ogrid[:cols-2]]
    win_ac = as_strided(ac[:, -2:, :], shape=(2, 2, cols-2, 3),
                        strides=ac.strides+ac.strides[2:3])
    win_ac = win_ac[np.ogrid[:2, :2, :cols-2] +
                    [np.argmin(win_ac[0], axis=2)]]
    ret[:, -1, 1:-1] = win_ac[:, np.argmin(win_ac[0], axis=0),
                             np.ogrid[:cols-2]]
    win_ac = as_strided(ac[:, :, :2], shape=(2, rows-2, 2, 3),
                        strides=ac.strides+ac.strides[1:2])
    win_ac = win_ac[np.ogrid[:2, :rows-2, :2] +
                    [np.argmin(win_ac[0], axis=2)]]
    ret[:, 1:-1, 0] = win_ac[:, np.ogrid[:rows-2],
                             np.argmin(win_ac[0], axis=1)]
    win_ac = as_strided(ac[:, :, -2:], shape=(2, rows-2, 2, 3),
                        strides=ac.strides+ac.strides[1:2])
    win_ac = win_ac[np.ogrid[:2, :rows-2, :2] +
                    [np.argmin(win_ac[0], axis=2)]]
    ret[:, 1:-1, -1] = win_ac[:, np.ogrid[:rows-2],
                             np.argmin(win_ac[0], axis=1)]
    # Fill the corners
    win_ac = ac[:, :2, :2]
    win_ac = win_ac[:, np.ogrid[:2],
                    np.argmin(win_ac[0], axis=-1)]
    ret[:, 0, 0] = win_ac[:, np.argmin(win_ac[0], axis=-1)]
    win_ac = ac[:, :2, -2:]
    win_ac = win_ac[:, np.ogrid[:2],
                    np.argmin(win_ac[0], axis=-1)]
    ret[:, 0, -1] = win_ac[:, np.argmin(win_ac[0], axis=-1)]
    win_ac = ac[:, -2:, -2:]
    win_ac = win_ac[:, np.ogrid[:2],
                    np.argmin(win_ac[0], axis=-1)]
    ret[:, -1, -1] = win_ac[:, np.argmin(win_ac[0], axis=-1)]
    win_ac = ac[:, -2:, :2]
    win_ac = win_ac[:, np.ogrid[:2],
                    np.argmin(win_ac[0], axis=-1)]
    ret[:, -1, 0] = win_ac[:, np.argmin(win_ac[0], axis=-1)]

    return ret

The return is a (2, rows, cols) array that can be unpacked into the two arrays:

>>> a = np.random.randint(100, size=(5,5))
>>> c = np.random.randint(100, size=(5,5))
>>> a
array([[42, 54, 18, 88, 26],
       [80, 65, 83, 31,  4],
       [51, 52, 18, 88, 52],
       [ 1, 70,  5,  0, 89],
       [47, 34, 27, 67, 68]])
>>> c
array([[94, 94, 29,  6, 76],
       [81, 47, 67, 21, 26],
       [44, 92, 20, 32, 90],
       [81, 25, 32, 68, 25],
       [49, 43, 71, 79, 77]])
>>> neighbor_min(a, c)
array([[[42, 18, 18,  4,  4],
        [42, 18, 18,  4,  4],
        [ 1,  1,  0,  0,  0],
        [ 1,  1,  0,  0,  0],
        [ 1,  1,  0,  0,  0]],

       [[94, 29, 29, 26, 26],
        [94, 29, 29, 26, 26],
        [81, 81, 68, 68, 68],
        [81, 81, 68, 68, 68],
        [81, 81, 68, 68, 68]]])

The OP's case could then be solved as:

def bd_from_ac(a, c):
    b,d = neighbor_min(a, c)
    return a*b, d

And while there is a serious performance hit, it is pretty fast still:

In [3]: a = np.random.rand(1000, 1000)

In [4]: c = np.random.rand(1000, 1000)

In [5]: %timeit bd_from_ac(a, c)
1 loops, best of 3: 570 ms per loop

You are not really using the coordinates of the minimum neighboring element for anything else than fetching it, so you may as well skip that part and create a min_neighbor function. If you don't want to resort to cython for fast looping, you are going to have to go with rolling window views, such as outlined in Paul's link. This will typically convert your (m, n) array into a (m-2, n-2, 3, 3) view of the same data, and you would then apply np.min over the last two axes.

Unfortunately you have to apply it one axis at a time, so you will have to create a (m-2, n-2, 3) copy of your data. Fortunately, you can compute the minimum in two steps, first windowing and minimizing along one axis, then along the other, and obtain the same result. So at most you are going to have intermediate storage the size of your input. If needed, you could even reuse the output array as intermediate storage and avoid memory allocations, but that is left as exercise...

The following function does that. It is kind of lengthy because it has to deal not only with the central area, but also with the special cases of the four edges and four corners. Other than that it is a pretty compact implementation:

def neighbor_min(a):
    rows, cols = a.shape
    ret = np.empty_like(a)

    # Fill in the center
    win_a = as_strided(a, shape=(m-2, n, 3),
                       strides=a.strides+a.strides[:1])
    win_a = win_a.min(axis=2)
    win_a = as_strided(win_a, shape=(m-2, n-2, 3),
                       strides=win_a.strides+win_a.strides[1:])
    ret[1:-1, 1:-1] = win_a.min(axis=2)

    # Fill the top, bottom, left and right borders
    win_a = as_strided(a[:2, :], shape=(2, cols-2, 3),
                       strides=a.strides+a.strides[1:])
    ret[0, 1:-1] = win_a.min(axis=2).min(axis=0)
    win_a = as_strided(a[-2:, :], shape=(2, cols-2, 3),
                       strides=a.strides+a.strides[1:])
    ret[-1, 1:-1] = win_a.min(axis=2).min(axis=0)
    win_a = as_strided(a[:, :2], shape=(rows-2, 2, 3),
                       strides=a.strides+a.strides[:1])
    ret[1:-1, 0] = win_a.min(axis=2).min(axis=1)
    win_a = as_strided(a[:, -2:], shape=(rows-2, 2, 3),
                       strides=a.strides+a.strides[:1])
    ret[1:-1, -1] = win_a.min(axis=2).min(axis=1)

    # Fill the corners
    ret[0, 0] = a[:2, :2].min()
    ret[0, -1] = a[:2, -2:].min()
    ret[-1, -1] = a[-2:, -2:].min()
    ret[-1, 0] = a[-2:, :2].min()

    return ret

You can now do things like:

>>> a = np.random.randint(10, size=(5, 5))
>>> a
array([[0, 3, 1, 8, 9],
       [7, 2, 7, 5, 7],
       [4, 2, 6, 1, 9],
       [2, 8, 1, 2, 3],
       [7, 7, 6, 8, 0]])
>>> neighbor_min(a)
array([[0, 0, 1, 1, 5],
       [0, 0, 1, 1, 1],
       [2, 1, 1, 1, 1],
       [2, 1, 1, 0, 0],
       [2, 1, 1, 0, 0]])

And your original question can be solved as:

def bd_from_ac(a, c):
    return a*neighbor_min(a), neighbor_min(c)

As a performance benchmark:

In [2]: m, n = 1000, 1000

In [3]: a = np.random.rand(m, n)

In [4]: c = np.random.rand(m, n)

In [5]: %timeit bd_from_ac(a, c)
1 loops, best of 3: 123 ms per loop
Jaime
  • 65,696
  • 17
  • 124
  • 159
  • Are `rows, cols = m, n = a.shape`? `m` and `n` are undefined when I try to run it, but the convention seems to differ from the OP. – askewchan Apr 04 '13 at 00:43
  • @Jaime: cool, and thank you for spending time to write the code! – diegogb Apr 04 '13 at 11:03
  • 2
    @diegogb: is this different from scipy.ndimage.filters.minimum_filter? Also, I may have misunderstood, but didn't diegogb want to use the neighbor indices from `a` to get values from `c` to produce `b`? (as opposed to finding the minimum neighbors of `c`?) – Paul Apr 04 '13 at 13:05
  • @Paul To your first question, no it is not. And your second point also seems correct... – Jaime Apr 04 '13 at 14:18
2

Finding a[min_coords] is a rolling window operation. Several clever solutions our outlined in this post. You'll want to make the creation of the c[min_coords] array a side-effect of whichever solution you choose.

I hope this helps. I can post some sample code later when I have some time.

Community
  • 1
  • 1
Paul
  • 42,322
  • 15
  • 106
  • 123
  • Thank you for the help, I've found other links about this question but yours seems the best one. Tomorrow I'll try to study the code samples (now It's 1:00 AM in my country) and I'll see if I can use those solutions (I'm a C# programmer and I'm not used to think to problems in the way numpy does). – diegogb Apr 03 '13 at 23:11
2

I have interest in helping you, and I believe there are possibly better solutions outside the scope of your question, but in order to put my own time into writing code, I must have some feedback of yours, because I am not 100% sure I understand what you need.

One thing to consider: if you are a C# developer, maybe a "brute-force" implementation of C# can outperform a clever implementation of Numpy, so you could consider at least testing your rather simple operations implemented in C#. Geotiff (which I suppose you are reading) has a relatively friendly specification, and I guess there might be .NET GeoTiff libraries around.

But supposing you want to give Numpy a try (and I believe you should), let's take a look at what you're trying to achieve:

  1. If you are going to run min_coords(array) in every element of arrays a and c, you might consider to "stack" nine copies of the same array, each copy rolled by some offset, using numpy.dstack() and numpy.roll(). Then, you apply numpy.argmin(stacked_array, axis=2) and you get an array containing values between 0 and 8, where each of these values map to a tuple containing the offset indexes.

Then, using this principle, your min_coords() function would be vectorized, operating in the whole array at once, and giving back an array that gives you an offset which would be the index of a lookup table containing the offsets.

If you have interest in elaborating this, please leave a comment.

Hope this helps!

heltonbiker
  • 26,657
  • 28
  • 137
  • 252
  • by the way, what kind of operation are you performing with this data? Is it a Digital Elevation Model, and you are trying to obtain some sort of gradient or other terrain-oriented data? – heltonbiker Apr 04 '13 at 01:45