Is there a way to remove the loop using numpy?

Question

I have a 3D numpy array input_data (q x m x n), that I am using to build histogram data to eventually plot, which is stored in plot_data (m x n x 2). This step is a decent bottleneck in my process, and I was wondering if there was a faster, more "numpy" way of doing this.

num_bins = 3
for i in range(m):

    for j in range(n):

        data = input_data[:, i, j]

        hist, bins = np.histogram(data, bins=num_bins)

        # Create the (x, y) pairs to plot
        plot_data[i][j] = np.stack((bins[:-1], hist), axis=1)

Faster meaning runtime, so ideally taking advantage of numpy's vectorization abilities, similar to using np.sum() to compute a sum rather than looping through and computing it manually — holtc, Oct 11 '17 at 19:03
So, I looked up some information. Perhaps, the scipy website would be what you are looking for? I found the following: https://docs.scipy.org/doc/numpy-1.13.0/reference/arrays.nditer.html — Maderas, Oct 11 '17 at 19:07
your data are (q,m,n). What are q,m,n values for further investigations ? — B. M., Oct 11 '17 at 19:10
q, m, n can be anything, but to approximate my data, probably 100000 x 10 x 360. @Maderas I don't think it is as effective for multi-dimensional arrays unfortunately. — holtc, Oct 11 '17 at 19:15
So the hard job is np.histogram, Since it is done at numpy level, it's optimized. The 3600 loops executed at python level is peanuts here. — B. M., Oct 11 '17 at 19:19
I figured that may be the case, was just wondering if there was some ti-dimensional array optimization that numpy provides for histograms using histogramdd or something — holtc, Oct 11 '17 at 19:21
@Maderas; `nditer` doesn't help here. It's just another iteration tool, still requiring a Python level loop. — hpaulj, Oct 11 '17 at 19:44

Divakar · Answer 1 · 2017-10-11T20:45:45.227

Here's a vectorized approach for generic number of bins -

def vectorized_app(input_data, num_bins):
    s0 = input_data.min(0)
    s1 = input_data.max(0)

    m,n,r = input_data.shape
    ids = (num_bins*((input_data - s0)/(s1-s0))).astype(int).clip(max=num_bins-1)
    offset = num_bins*(r*np.arange(n)[:,None] + np.arange(r))
    ids3D = ids + offset
    count3D = np.bincount(ids3D.ravel(), minlength=n*r*num_bins).reshape(n,r,-1)
    bins3D = create_ranges_nd(s0, s1, num_bins+1)[...,:-1]

    out = np.empty((n,r,num_bins,2))
    out[...,0] = bins3D
    out[...,1] = count3D
    return out

Helper function(s) -

# https://stackoverflow.com/a/46694364/ @Divakar
def create_ranges_nd(start, stop, N, endpoint=True):
    if endpoint==1:
        divisor = N-1
    else:
        divisor = N
    steps = (1.0/divisor) * (stop - start)
    return start[...,None] + steps[...,None]*np.arange(N)

Runtime test

Original approach -

def org_app(input_data, num_bins):
    q,m,n = input_data.shape
    plot_data = np.zeros((m,n,num_bins,2))
    for i in range(m):
        for j in range(n):
            data = input_data[:, i, j]
            hist, bins = np.histogram(data, bins=num_bins)
            plot_data[i][j] = np.stack((bins[:-1], hist), axis=1)
    return plot_data

Timings and verification -

Let's test out on a large data array of shape (100, 100, 100) and with number of bins as 10 :

In [967]: # Setup input
     ...: num_bins = 10
     ...: m = 100
     ...: n = 100
     ...: q = 100
     ...: input_data = np.random.rand(q,m,n)
     ...: 
     ...: out1 = org_app(input_data, num_bins)
     ...: out2 = vectorized_app(input_data, num_bins)
     ...: print np.allclose(out1, out2)
     ...: 
True

In [968]: %timeit org_app(input_data, num_bins)
1 loop, best of 3: 748 ms per loop

In [969]: %timeit vectorized_app(input_data, num_bins)
100 loops, best of 3: 12.7 ms per loop

In [970]: 748/12.7 # speedup with vectorized one over original
Out[970]: 58.89763779527559

what is the performance of this vs the for loops over a large 3d array? — holtc, Oct 11 '17 at 20:24
@holtc Added assuming random numbers of shape (100,100,100) and number of bins = 10. — Divakar, Oct 11 '17 at 20:32
So when I run this on my real data, np.allclose() is false, and since I am jitting all my methods, it is actually slower, so unfortunately I won't be able to use this. In my case, input_data will be on the order of (100000, 8, 360) with a max of 100 bins — holtc, Oct 12 '17 at 16:11
@holtc Yeah so your loop count is mere 8 * 360. That's not much. Keep your loop/jittery-jit. — Divakar, Oct 12 '17 at 16:20
@holtc It would help if you could mention those size/shape details in your future questions upfront. — Divakar, Oct 12 '17 at 16:24

B. M. · Answer 2 · 2017-10-11T20:01:35.783

I think that your sample are similar, so the histograms are similar. In this case you can simplify the comparisons and do it in a more vectored manner :

a=np.random.rand(100000,10,10)


def f():  # roughly your approach.
    plotdata=np.zeros((10,10,3),np.int32)
    for i in range(10):
        for j in range(10):
            bins,hist=np.histogram(a[:,i,j],3)
            plotdata[i,j]=bins 
    return plotdata

def g(): #vectored comparisons 
    u=(a < 1/3).sum(axis=0)
    w=(a > 2/3).sum(axis=0)
    v=len(a)-u-w
    return np.dstack((u,v,w))

For a 8x improvement:

In [213]: %timeit f()
548 ms ± 15.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [214]: %timeit g()
77.7 ms ± 5.46 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

unfortunately we can't simplify like this, we need the real data — holtc, Oct 11 '17 at 20:22

Is there a way to remove the loop using numpy?

2 Answers2