vectorizing numpy bincount

Question

I have a 2d numpy array., A I want to apply np.bincount() to each column of the matrix A to generate another 2d array B that is composed of the bincounts of each column of the original matrix A.

My problem is that np.bincount() is a function that takes a 1d array-like. It's not an array method like B = A.max(axis=1) for example.

Is there a more pythonic/numpythic way to generate this B array other than a nasty for-loop?

import numpy as np

states = 4
rows = 8
cols = 4

A = np.random.randint(0,states,(rows,cols))
B = np.zeros((states,cols))

for x in range(A.shape[1]):
    B[:,x] =  np.bincount(A[:,x])

score 6 · Answer 1 · edited May 23 '17 at 10:30

6

Using the same philosophy as in this post, here's a vectorized approach -

m = A.shape[1]    
n = A.max()+1
A1 = A + (n*np.arange(m))
out = np.bincount(A1.ravel(),minlength=n*m).reshape(m,-1).T

edited May 23 '17 at 10:30

Community

1
1

answered Nov 14 '16 at 16:26

Divakar

218,885
19
262
358

This solution is really brilliant! Just in case anyone wants to do bincount for each row like me, here is the modified code: `m = A.shape[0] n = A.max()+1 A1 = A + (n\*np.arange(m).reshape(m,1)) out = np.bincount(A1.ravel(),minlength=n\*m).reshape(-1,n)` – Lala La Oct 13 '19 at 22:06
1

@LalaLa Yeah, there's a separate and detailed Q&A for that - https://stackoverflow.com/questions/46256279/. – Divakar Oct 13 '19 at 22:09

jotasi · Answer 2 · 2016-11-14T15:26:26.600

I would suggest to use np.apply_along_axis, which will allow you to apply a 1D-method (in this case np.bincount) to 1D slices of a higher dimensional array:

import numpy as np

states = 4
rows = 8
cols = 4

A = np.random.randint(0,states,(rows,cols))
B = np.zeros((states,cols))

B = np.apply_along_axis(np.bincount, axis=0, arr=A)

You'll have to be careful, though. This (as well as your suggested for-loop) only works if the output of np.bincount has the right shape. If the maximum state is not present in one or multiple columns of your array A, the output will not have a smaller dimensionality and thus, the code will file with a ValueError.

Note that apply_along_axis is just syntactic sugar for a for-loop, and has the same performance characteristics. — Eelco Hoogendoorn, Nov 14 '16 at 16:17

Eelco Hoogendoorn · Answer 3 · 2016-11-14T16:16:08.810

This solution using the numpy_indexed package (disclaimer: I am its author) is fully vectorized, thus does not include any python loops behind the scenes. Also, there are no restrictions on the input; not every column needs to contain the same set of unique values.

import numpy_indexed as npi
rowidx, colidx = np.indices(A.shape)
(bin, col), B = npi.count_table(A.flatten(), colidx.flatten())

This gives an alternative (sparse) representation of the same result, which may be much more appropriate if the B array does indeed contain many zeros:

(bin, col), count = npi.count((A.flatten(), colidx.flatten()))

Note that apply_along_axis is just syntactic sugar for a for-loop, and has the same performance characteristics.

score 1 · Answer 4 · answered Nov 14 '16 at 16:32

Yet another possibility:

import numpy as np


def bincount_columns(x, minlength=None):
    nbins = x.max() + 1
    if minlength is not None:
        nbins = max(nbins, minlength)
    ncols = x.shape[1]
    count = np.zeros((nbins, ncols), dtype=int)
    colidx = np.arange(ncols)[None, :]
    np.add.at(count, (x, colidx), 1)
    return count

For example,

In [110]: x
Out[110]: 
array([[4, 2, 2, 3],
       [4, 3, 4, 4],
       [4, 3, 4, 4],
       [0, 2, 4, 0],
       [4, 1, 2, 1],
       [4, 2, 4, 3]])

In [111]: bincount_columns(x)
Out[111]: 
array([[1, 0, 0, 1],
       [0, 1, 0, 1],
       [0, 3, 2, 0],
       [0, 2, 0, 2],
       [5, 0, 4, 2]])

In [112]: bincount_columns(x, minlength=7)
Out[112]: 
array([[1, 0, 0, 1],
       [0, 1, 0, 1],
       [0, 3, 2, 0],
       [0, 2, 0, 2],
       [5, 0, 4, 2],
       [0, 0, 0, 0],
       [0, 0, 0, 0]])

vectorizing numpy bincount

4 Answers4

Linked