1

Problem description

I have a 3D numpy array, denoted as data, of shape N x R x C, i.e. N samples, R rows and C columns. I would like to obtain histograms along column for each combination of sample and row. However bin edges (see argument bins in numpy.histogram), of fixed length S, will be different at different rows but are shared across samples. Consider this example for illustration, for the 1st sample (data[0]), bin edge sequence for its 1st row is different from that for its 2nd row, but is the same as that for the 1st row from the 2nd sample (data[1]). Thus all the bin edge sequences are stored in a 2D numpy array of shape R x S, denoted as bin_edges.

My question is how to efficiently calculate the histograms?

A working but slow solution

Using numpy.histogram, I was able to come up with a working but fairly slow solution as shown in the below code snippet

```
Get dummy data

    N: number of samples
    R: number of rows (or kernels)
    C: number of columns (or pixels)
    S: number of bins
```
import numpy as np

N, R, C, S = 100, 50, 1000, 10
data = np.random.randn(N, R, C)

# for each row/kernel, pool pixels of all samples
poolsamples = np.swapaxes(data, 0, 1).reshape(R, -1)
# use quantiles as bin edges
percentiles = np.linspace(0, 100, num=(S + 1))
bin_edges = np.transpose(np.percentile(poolsamples, percentiles, axis=1))


```
A working but slow solution of getting histograms along column
```
hist = np.empty((N, R, S))
for idx in np.arange(R):
    bin_edges_i = bin_edges[idx, :]
    counts = np.apply_along_axis(
        lambda a: np.histogram(a, bins=bin_edges_i)[0],
        1, data[:, idx, :])
    hist[:, idx, :] = counts

Possible directions

  • Fancy numpy reshape to avoid using for loop at all
  • This problem arises from extracting low-end characteristics for each image forwarded through a trained neural network. Therefore, if the histogram extraction can be embedded in TensorFlow graph and ultimately be carried out on GPU, that would be ideal!
  • I noticed a python package fast-histogram which claims to be 7-15x faster than numpy.histogram. However 1d histogram function can only takes number of bins instead of actual bin positions
  • numexpr?

I would love to hear any inputs! Thanks in advance!

statechular
  • 273
  • 3
  • 12

1 Answers1

1

Making use of 2D version of np.searchsorted : searchsorted2d -

def vectorized_app(data, bin_edges):
    N, R, C = data.shape
    a = np.sort(data.reshape(-1,C),1)
    b = np.repeat(bin_edges[None],N,axis=0).reshape(-1,bin_edges.shape[-1])

    idx = searchsorted2d(a,b)
    idx[:,0] = 0
    idx[:,-1] = a.shape[1]
    out = (idx[:,1:] - idx[:,:-1]).reshape(N,R,-1)
    return out

Runtime test -

In [591]: N, R, C, S = 100, 50, 1000, 10
     ...: data = np.random.randn(N, R, C)
     ...: 
     ...: # for each row/kernel, pool pixels of all samples
     ...: poolsamples = np.swapaxes(data, 0, 1).reshape(R, -1)
     ...: # use quantiles as bin edges
     ...: percentiles = np.linspace(0, 100, num=(S + 1))
     ...: bin_edges = np.transpose(np.percentile(poolsamples, percentiles, axis=1))
     ...: 

In [592]: %timeit org_app(data, bin_edges)
1 loop, best of 3: 481 ms per loop

In [593]: %timeit vectorized_app(data, bin_edges)
1 loop, best of 3: 224 ms per loop

In [595]: np.allclose(org_app(data, bin_edges), vectorized_app(data, bin_edges))
Out[595]: True

More than 2x speedup there.

Closer look reveals that the bottleneck with the proposed vectorized method is the sorting itself -

In [594]: %timeit np.sort(data.reshape(-1,C),1)
1 loop, best of 3: 194 ms per loop

We need this sorting to use searchsorted.

Divakar
  • 218,885
  • 19
  • 262
  • 358