NumPy - Vectorizing bincount over 2D array column wise with weights

Question

I've been looking at the solutions here and here but failing to see how I can apply it to my structures.

I have 3 arrays: an (M, N) of zeros, and (P,) of indexes (some repeat) and an (P, N) of values.

I can accomplish it with a loop:

# a: (M, N)
# b: (P, N)
# ix: (M,)
for i in range(N):
    a[:, i] += np.bincount(ix, weights=b[:, i], minlength=M)

I've not seen any examples that use indexes in this manner, or with the weights keyword. I understand I need to bring everything into a 1D array to vectorize it, however I am struggling to figure out how to accomplish that.

Did the posted solution work for you? – Divakar Feb 04 '20 at 15:12 — Divakar, Feb 04 '20 at 15:12

score 0 · Answer 1 · answered Feb 03 '20 at 19:49

0

Basic idea stays the same as discussed in some detail in those linked posts, i.e. create a 2D array of bins with offsets per "1D data" to be processed (per col in this case). So, with those in mind, we will end up with something like this -

# Extent of bins per col
n = ix.max()+1

# 2D bins for per col processing
ix2D = ix[:,None] + n*np.arange(b.shape[1])

# Finally use bincount with those 2D bins as flattened and with
# flattened b as weights. Reshaping is needed to add back into "a".
a[:n] += np.bincount(ix2D.ravel(), weights=b.ravel(), minlength=n*N).reshape(N,-1).T

answered Feb 03 '20 at 19:49

Divakar

218,885
19
262
358

Okay I was close with the double `ravel()`. When you assign to `a[:n]`, would it be a problem if the indices don't align directly? For example, what if my indices are from 15 to 4000? That makes `n = 4001`, so doesn't `a[:n]` assign from 0 to 4000? – pstatix Feb 03 '20 at 19:56
@pstatix The proposed code is supposed to exactly simulate your loop-based code., If you have indices in 15 to 4000, it would simply skip those from 0 to 15 for your loop based code and the same for the proposed vectorized code. – Divakar Feb 03 '20 at 19:58
How could you skip it in the vectorized code? A quick run shows the the results were shifted by 1 (as I thought they'd be). – pstatix Feb 03 '20 at 20:00
@pstatix So, in your actual use-case, if you have `4000` in `ix`, your loopy code with `np.bincount` would give an array of a minimum length of `4001`. So, if you are saying that `a` has a shape such that `M` (a.shape[0]) as 4000, your loopy code would throw error too. Have you checked that your loopy code works with that actual use-case? – Divakar Feb 03 '20 at 20:09
@pstatix I suspect you are initializing `a` wrongly. Make sure number of rows in `a` is at least `ix.max()+1`. Else, neither your loopy code, nor the proposed vectorized code would work. – Divakar Feb 03 '20 at 20:24
@Divakar I am trying to adapt you solution for a similar issue as I am describing here (https://stackoverflow.com/q/62719951/1476932), could you elaborate a bit what `ix2D` does. – ttsesm Jul 06 '20 at 10:00

NumPy - Vectorizing bincount over 2D array column wise with weights

1 Answers1

Linked