35

I am doing some performance analysis, and i wonder, whether numpy vectorizes its standard array operations, when the datatype is known (double).

a, b = (some numpy arrays)
c = a + b #Is this vectorized?

Edit: Is this operation vectorized, i.e. will the computation consist of SIMD operations?

hr0m
  • 2,643
  • 5
  • 28
  • 39
  • 3
    well... yes? Why would you think otherwise? – Julien Jul 06 '17 at 09:06
  • 6
    "Vectorized" in what sense? The usual sense in which the word is used in a NumPy context may not be the sense you're thinking of, if you're thinking of hardware-level SIMD operations. – user2357112 Jul 06 '17 at 09:07
  • Numpy doesn't even have 8bit or 4bit data types, so it would be hard for it to take advantage of wide SIMD operations :( – Thomas Ahle Dec 22 '22 at 09:12
  • That is just wrong, numpy has byte. And with avx512 float and double is also pretty wide. – hr0m Dec 23 '22 at 10:16

2 Answers2

43

Yes, they are.

/*
 * This file is for the definitions of simd vectorized operations.
 *
 * Currently contains sse2 functions that are built on amd64, x32 or
 * non-generic builds (CFLAGS=-march=...)
 * In future it may contain other instruction sets like AVX or NEON     detected
 * at runtime in which case it needs to be included indirectly via a file
 * compiled with special options (or use gcc target attributes) so the binary
 * stays portable.
 */

Link: Numpy simd.inc.src on github.

henrikstroem
  • 2,978
  • 4
  • 35
  • 51
  • Actually, NumPy does not. [Here](https://chelseatroy.com/2018/11/07/code-mechanic-numpy-vectorization/) is an article that goes through NumPy code and demonstrating it through Matrix multiplication. – Quazi Irfan Jan 14 '21 at 23:52
  • 2
    @QuaziIrfan Article is titled with word "vectorization", but it really looks only for "parallelization", and only conclusion author made: "it does not capitalize on parallelization". So it doesn't really mean anything about SIMD. – Rustam A. May 29 '21 at 20:49
  • Does it mean only float16 or float32 benefit from SIMD on a x86 architecture? – Sergey Bushmanov Oct 18 '21 at 14:11
2

i notice there is a comment from Quazi Irfan on henrikstroem's answer,which says numpy doesn't exploit vectorization and cites a blog in which the author made a "proof" by experiments.

so i go through the blog and found there is a gap may conduct a different conclusion:for numpy-array a and b,the arithmetic a*b is different with np.dot(a,b).the arithmetic(a*b) which the blog author tested is just scalar multiplication,not a matrix multiplication(np.dot(a,b)),even not a vector inner product.but the author still used a*b to compare with the original experiment which runs np.dot(a,b).the complexity of these two arithmetics are so different!

numpy certainly exploits vectorized by SIMD and BLAS,which can be found in its source code.the official numpy distribution supports a set of parallel manipulation(like np.dot),but not every functions(like np.where,np.mean).the blog author may choose an inappropriate function(a unvectorized function) to compare.

we can also see that in respect of multi-cores CPU usage.when executing numpy.dot(),all the cores are performing a high usage.Hence numpy must have vectorized(by BLAS) to avoid just using a single core because of the CPython's GIL restriction.

  • 1) Could you provide the blog post? 2) Thread-level parallelism has nothing to do with instruction-level paralelism. – hr0m Dec 02 '21 at 21:21