1

This is a followup to this SO answer

https://stackoverflow.com/a/71185257/3259896

Moreover, note that mean is not very optimized for such a case. It is faster to use (a[b[:,0]] + a[b[:,1]]) * 0.5 although the intent is less clear.

This is further elaborated in the comments

mean is optimized for 2 cases: the computation of the mean of contiguous lines along the last contiguous axis OR the computation of the mean of many long contiguous lines along a non-contiguous axis.

I looked up contiguous arrays and found it explained here

What is the difference between contiguous and non-contiguous arrays?

It means stored in unbroken blocks of memory.

However, it is still not clear to me if there's any solid cases where I should use mean over just performing the calculations in python.

I would love to have some solid examples of where and when to use each type of operation.

SantoshGupta7
  • 5,607
  • 14
  • 58
  • 116
  • 1
    This is pretty complex in practice, and for Numpy end-users like you, it is probably better just to test the different possibilities like @hpaulj did. Not only the memory layout matters as you pointed out, but SIMD optimizations too. The thing is sometimes the Numpy code is optimized by compilers which make assumptions that are not true in practice producing inefficient code. For more information about this case, you can read [this post](https://stackoverflow.com/a/70994975/12939557). Note that all reductions are computed the same way internally in Numpy so is `np.mean`. – Jérôme Richard Feb 20 '22 at 12:07
  • 1
    Moreover, the Numpy code is not fully optimized yet and there is many cases where the code is sub-optimal. We are currently working on improving the use of SIMD instruction to make Numpy faster but there are many cases where this is not trivial to do that efficiently. Thus, be aware that what is the best code for now may not be in a near future (eg. the next years). All of this is also dependent of the hardware and you can get a different behaviour on a very different platform (eg. ARM and PowerPC as well as very recent Intel AVX-512-cappable processors compared to mainstream x86-64 machines). – Jérôme Richard Feb 20 '22 at 12:13

1 Answers1

2

While I've worked with numpy for a long time, I still have to do timings. I can predict some comparisons, but not all. In addition there's a matter of scaling. Your previous example was relatively small.

With the (5,3) and (3,2) a and b:

In [145]: timeit np.add(a[b[:,0]],a[b[:,1]])/2
17.8 µs ± 24.9 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [146]: timeit (a[b[:,0]]+a[b[:,1]])/2
17.9 µs ± 302 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [147]: timeit (a[b[:,0]]+a[b[:,1]])/2
17.8 µs ± 18.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [148]: timeit np.add(a[b[:,0]],a[b[:,1]])/2
18 µs ± 6.43 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [149]: timeit np.add.reduce(a[b],1)/2
19.3 µs ± 1.04 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [150]: timeit np.sum(a[b],1)/2
25.1 µs ± 309 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [151]: timeit np.mean(a[b],1)
35.9 µs ± 853 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [152]: timeit a[b].mean(1)
29.4 µs ± 658 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [153]: timeit a[b].sum(1)/2
20.9 µs ± 885 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

While a[b[:,0]]+a[b[:,1]] is fastest, you probably don't want to expand that when b is (n, 5).

Note all these alternatives make full use of numpy array methods.

What you want to watch out for is using list like iterations on an array, or performing array operations on lists, especially small ones. Making an array from a list takes time. Iterating on elements of an array is slower than iterating on the elements of a list.

hpaulj
  • 221,503
  • 14
  • 230
  • 353