How does numpy broadcasting perform faster?

Question

In the following question, https://stackoverflow.com/a/40056135/5714445

Numpy's broadcasting provides a solution that's almost 6x faster than using np.setdiff1d() paired with np.view(). How does it manage to do this?

And using A[~((A[:,None,:] == B).all(-1)).any(1)] speeds it up even more. Interesting, but raises yet another question. How does this perform even better?

My bad. Faster compared to using np.setdiff1d paired with np.view — R. S. Nikhil Krishna, Oct 15 '16 at 07:37
Not a comprehensive answer, but `setdiff1d` requires numerous full sorting operations behind the scenes whereas the line of code above just allocates a block of memory, fills it and then reduces it down to a 1D array to index `A` with. However, using code such as `(A[:,None,:] == B)` becomes very inefficient as `A` and `B` get larger (potentially massive blocks of memory must be allocated and filled), so it may actually become the slower option. — Alex Riley, Oct 15 '16 at 09:01

Divakar · Accepted Answer · 2016-10-15T10:48:31.930

I would try to answer the second part of the question.

So, with it we are comparing :

A[np.all(np.any((A-B[:, None]), axis=2), axis=0)]  (I)

and

A[~((A[:,None,:] == B).all(-1)).any(1)]

To compare with a matching perspective against the first one, we could write down the second approach like this -

A[(((~(A[:,None,:] == B)).any(2))).all(1)]         (II)

The major difference when considering performance, would be the fact that with the first one, we are getting non-matches with subtraction and then checking for non-zeros with .any(). Thus, any() is made to operate on an array of non-boolean dtype array. In the second approach, instead we are feeding it a boolean array obtained with A[:,None,:] == B.

Let's do a small runtime test to see how .any() performs on int dtype vs boolean array -

In [141]: A = np.random.randint(0,9,(1000,1000)) # An int array

In [142]: %timeit A.any(0)
1000 loops, best of 3: 1.43 ms per loop

In [143]: A = np.random.randint(0,9,(1000,1000))>5 # A boolean array

In [144]: %timeit A.any(0)
10000 loops, best of 3: 164 µs per loop

So, with close to 9x speedup on this part, we see a huge advantage to use any() with boolean arrays. This I think was the biggest reason to make the second approach faster.

How does numpy broadcasting perform faster?

1 Answers1

Linked