4

I want to modify an empty bitmap by given indicators (x and y axis). For every coordinate given by the indicators the value should be raised by one.

So far so good everything seems to work. But if I have some similar indicators in my array of indicators it will only raise the value once.

>>> img
array([[0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0]])

>>> inds
array([[0, 0],
       [3, 4],
       [3, 4]])

Operation:

>>> img[inds[:,1], inds[:,0]] += 1

Result:

>>> img
    array([[1, 0, 0, 0, 0],
           [0, 0, 0, 0, 0],
           [0, 0, 0, 0, 0],
           [0, 0, 0, 0, 0],
           [0, 0, 0, 1, 0]])

Expected result:

>>> img
    array([[1, 0, 0, 0, 0],
           [0, 0, 0, 0, 0],
           [0, 0, 0, 0, 0],
           [0, 0, 0, 0, 0],
           [0, 0, 0, 2, 0]])

Does someone have an idea how to solve this? Preferably a fast approach without the use of loops.

jpp
  • 159,742
  • 34
  • 281
  • 339
elajdsha
  • 95
  • 7

3 Answers3

6

This is one way. Counting algorithm courtesy of @AlexRiley.

For performance implications of relative sizes of img and inds, see @PaulPanzer's answer.

# count occurrences of each row and return array
counts = (inds[:, None] == inds).all(axis=2).sum(axis=1)

# apply indices and counts
img[inds[:,1], inds[:,0]] += counts

print(img)

array([[1, 0, 0, 0, 0],
       [0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0],
       [0, 0, 0, 2, 0]])
jpp
  • 159,742
  • 34
  • 281
  • 339
5

You could use numpy.add.at with a bit of manipulation to get the indices ready.

np.add.at(img, tuple(inds[:, [1, 0]].T), 1)

If you have larger inds arrays, this approach should remain fast... (though Paul Panzer's solution is faster)

miradulo
  • 28,857
  • 6
  • 80
  • 93
4

Two remarks on the other two answers:

1) @jpp's can be improved by using np.unique with the axis and return_counts keywords.

2) If we translate to flat indexing we can use np.bincount which often (but not always, see last test case in benchmarks) is faster than np.add.at.

Thanks @miradulo for initial version of benchmarks.

import numpy as np

def jpp(img, inds):
    counts = (inds[:, None] == inds).all(axis=2).sum(axis=1)
    img[inds[:,1], inds[:,0]] += counts

def jpp_pp(img, inds):
    unq, cnts = np.unique(inds, axis=0, return_counts=True)
    img[unq[:,1], unq[:,0]] += cnts

def miradulo(img, inds):
    np.add.at(img, tuple(inds[:, [1, 0]].T), 1)

def pp(img, inds):
    imgf = img.ravel()
    indsf = np.ravel_multi_index(inds.T[::-1], img.shape[::-1])
    imgf += np.bincount(indsf, None, img.size)

inds = np.random.randint(0, 5, (3, 2))
big_inds = np.random.randint(0, 5, (10000, 2))
sml_inds = np.random.randint(0, 1000, (5, 2))
from timeit import timeit


for f in jpp, jpp_pp, miradulo, pp:
    print(f.__name__)
    for i, n, a in [(inds, 1000, 5), (big_inds, 10, 5), (sml_inds, 10, 1000)]:
        img = np.zeros((a, a), int)
        print(timeit("f(img, i)", globals=dict(img=img, i=i, f=f), number=n) * 1000 / n, 'ms')

Output:

jpp
0.011815106990979984 ms
2623.5026352020213 ms
0.04642329877242446 ms
jpp_pp
0.041291153989732265 ms
5.418520100647584 ms
0.05826510023325682 ms
miradulo
0.007099648006260395 ms
0.7788308983435854 ms
0.009103797492571175 ms
pp
0.0035401539935264736 ms
0.06540440081153065 ms
3.486583800986409 ms
Paul Panzer
  • 51,835
  • 3
  • 54
  • 99
  • 2
    Ahh this is nice, I'll get rid of my benchmark in favor of yours. – miradulo May 30 '18 at 00:43
  • 2
    @jpp There is one scenario where `bincount` should not be the best which is large `img`, small `inds`. I'll try and add that to the benchmarks. – Paul Panzer May 30 '18 at 00:55