numpy array indicator operation

Question

I want to modify an empty bitmap by given indicators (x and y axis). For every coordinate given by the indicators the value should be raised by one.

So far so good everything seems to work. But if I have some similar indicators in my array of indicators it will only raise the value once.

>>> img
array([[0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0]])

>>> inds
array([[0, 0],
       [3, 4],
       [3, 4]])

Operation:

>>> img[inds[:,1], inds[:,0]] += 1

Result:

>>> img
    array([[1, 0, 0, 0, 0],
           [0, 0, 0, 0, 0],
           [0, 0, 0, 0, 0],
           [0, 0, 0, 0, 0],
           [0, 0, 0, 1, 0]])

Expected result:

>>> img
    array([[1, 0, 0, 0, 0],
           [0, 0, 0, 0, 0],
           [0, 0, 0, 0, 0],
           [0, 0, 0, 0, 0],
           [0, 0, 0, 2, 0]])

Does someone have an idea how to solve this? Preferably a fast approach without the use of loops.

Also see https://stackoverflow.com/a/7435155/8033585 – AGN Gazer May 30 '18 at 01:26 — AGN Gazer, May 30 '18 at 01:26

jpp · Answer 1 · 2018-05-30T01:18:34.273

This is one way. Counting algorithm courtesy of @AlexRiley.

For performance implications of relative sizes of img and inds, see @PaulPanzer's answer.

# count occurrences of each row and return array
counts = (inds[:, None] == inds).all(axis=2).sum(axis=1)

# apply indices and counts
img[inds[:,1], inds[:,0]] += counts

print(img)

array([[1, 0, 0, 0, 0],
       [0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0],
       [0, 0, 0, 2, 0]])

miradulo · Answer 2 · 2018-05-30T00:44:00.347

5

You could use numpy.add.at with a bit of manipulation to get the indices ready.

np.add.at(img, tuple(inds[:, [1, 0]].T), 1)

If you have larger inds arrays, this approach should remain fast... (though Paul Panzer's solution is faster)

edited May 30 '18 at 00:44

answered May 29 '18 at 23:41

miradulo

28,857
6
80
93

Paul Panzer · Accepted Answer · 2018-05-30T01:01:47.597

Two remarks on the other two answers:

1) @jpp's can be improved by using np.unique with the axis and return_counts keywords.

2) If we translate to flat indexing we can use np.bincount which often (but not always, see last test case in benchmarks) is faster than np.add.at.

Thanks @miradulo for initial version of benchmarks.

import numpy as np

def jpp(img, inds):
    counts = (inds[:, None] == inds).all(axis=2).sum(axis=1)
    img[inds[:,1], inds[:,0]] += counts

def jpp_pp(img, inds):
    unq, cnts = np.unique(inds, axis=0, return_counts=True)
    img[unq[:,1], unq[:,0]] += cnts

def miradulo(img, inds):
    np.add.at(img, tuple(inds[:, [1, 0]].T), 1)

def pp(img, inds):
    imgf = img.ravel()
    indsf = np.ravel_multi_index(inds.T[::-1], img.shape[::-1])
    imgf += np.bincount(indsf, None, img.size)

inds = np.random.randint(0, 5, (3, 2))
big_inds = np.random.randint(0, 5, (10000, 2))
sml_inds = np.random.randint(0, 1000, (5, 2))
from timeit import timeit


for f in jpp, jpp_pp, miradulo, pp:
    print(f.__name__)
    for i, n, a in [(inds, 1000, 5), (big_inds, 10, 5), (sml_inds, 10, 1000)]:
        img = np.zeros((a, a), int)
        print(timeit("f(img, i)", globals=dict(img=img, i=i, f=f), number=n) * 1000 / n, 'ms')

Output:

jpp
0.011815106990979984 ms
2623.5026352020213 ms
0.04642329877242446 ms
jpp_pp
0.041291153989732265 ms
5.418520100647584 ms
0.05826510023325682 ms
miradulo
0.007099648006260395 ms
0.7788308983435854 ms
0.009103797492571175 ms
pp
0.0035401539935264736 ms
0.06540440081153065 ms
3.486583800986409 ms

Ahh this is nice, I'll get rid of my benchmark in favor of yours. — miradulo, May 30 '18 at 00:43
@jpp There is one scenario where `bincount` should not be the best which is large `img`, small `inds`. I'll try and add that to the benchmarks. — Paul Panzer, May 30 '18 at 00:55

numpy array indicator operation

3 Answers3