8

I have a 2d numpy array., A I want to apply np.bincount() to each column of the matrix A to generate another 2d array B that is composed of the bincounts of each column of the original matrix A.

My problem is that np.bincount() is a function that takes a 1d array-like. It's not an array method like B = A.max(axis=1) for example.

Is there a more pythonic/numpythic way to generate this B array other than a nasty for-loop?

import numpy as np

states = 4
rows = 8
cols = 4

A = np.random.randint(0,states,(rows,cols))
B = np.zeros((states,cols))

for x in range(A.shape[1]):
    B[:,x] =  np.bincount(A[:,x])
user3556757
  • 3,469
  • 4
  • 30
  • 70

4 Answers4

6

Using the same philosophy as in this post, here's a vectorized approach -

m = A.shape[1]    
n = A.max()+1
A1 = A + (n*np.arange(m))
out = np.bincount(A1.ravel(),minlength=n*m).reshape(m,-1).T
Community
  • 1
  • 1
Divakar
  • 218,885
  • 19
  • 262
  • 358
  • This solution is really brilliant! Just in case anyone wants to do bincount for each row like me, here is the modified code: `m = A.shape[0] n = A.max()+1 A1 = A + (n\*np.arange(m).reshape(m,1)) out = np.bincount(A1.ravel(),minlength=n\*m).reshape(-1,n)` – Lala La Oct 13 '19 at 22:06
  • 1
    @LalaLa Yeah, there's a separate and detailed Q&A for that - https://stackoverflow.com/questions/46256279/. – Divakar Oct 13 '19 at 22:09
2

I would suggest to use np.apply_along_axis, which will allow you to apply a 1D-method (in this case np.bincount) to 1D slices of a higher dimensional array:

import numpy as np

states = 4
rows = 8
cols = 4

A = np.random.randint(0,states,(rows,cols))
B = np.zeros((states,cols))

B = np.apply_along_axis(np.bincount, axis=0, arr=A)

You'll have to be careful, though. This (as well as your suggested for-loop) only works if the output of np.bincount has the right shape. If the maximum state is not present in one or multiple columns of your array A, the output will not have a smaller dimensionality and thus, the code will file with a ValueError.

jotasi
  • 5,077
  • 2
  • 29
  • 51
2

This solution using the numpy_indexed package (disclaimer: I am its author) is fully vectorized, thus does not include any python loops behind the scenes. Also, there are no restrictions on the input; not every column needs to contain the same set of unique values.

import numpy_indexed as npi
rowidx, colidx = np.indices(A.shape)
(bin, col), B = npi.count_table(A.flatten(), colidx.flatten())

This gives an alternative (sparse) representation of the same result, which may be much more appropriate if the B array does indeed contain many zeros:

(bin, col), count = npi.count((A.flatten(), colidx.flatten()))

Note that apply_along_axis is just syntactic sugar for a for-loop, and has the same performance characteristics.

Eelco Hoogendoorn
  • 10,459
  • 1
  • 44
  • 42
1

Yet another possibility:

import numpy as np


def bincount_columns(x, minlength=None):
    nbins = x.max() + 1
    if minlength is not None:
        nbins = max(nbins, minlength)
    ncols = x.shape[1]
    count = np.zeros((nbins, ncols), dtype=int)
    colidx = np.arange(ncols)[None, :]
    np.add.at(count, (x, colidx), 1)
    return count

For example,

In [110]: x
Out[110]: 
array([[4, 2, 2, 3],
       [4, 3, 4, 4],
       [4, 3, 4, 4],
       [0, 2, 4, 0],
       [4, 1, 2, 1],
       [4, 2, 4, 3]])

In [111]: bincount_columns(x)
Out[111]: 
array([[1, 0, 0, 1],
       [0, 1, 0, 1],
       [0, 3, 2, 0],
       [0, 2, 0, 2],
       [5, 0, 4, 2]])

In [112]: bincount_columns(x, minlength=7)
Out[112]: 
array([[1, 0, 0, 1],
       [0, 1, 0, 1],
       [0, 3, 2, 0],
       [0, 2, 0, 2],
       [5, 0, 4, 2],
       [0, 0, 0, 0],
       [0, 0, 0, 0]])
Warren Weckesser
  • 110,654
  • 19
  • 194
  • 214