3

I want to test whether all elements of an array are zero. According to the StackOverflow posts Test if numpy array contains only zeros and https://stackoverflow.com/a/72976775/5269892, compared to (array == 0).all(), not array.any() should be the both most memory-efficient and fastest method.

I tested the performance with a random-number floating array, see below. Somehow though, at least for the given array size, not array.any() and even casting the array to boolean type appear to be slower than (array == 0).all(). How comes?

np.random.seed(100)
a = np.random.rand(10418*144)

%timeit (a == 0)
%timeit (a == 0).all()
%timeit a.astype(bool)
%timeit a.any()
%timeit not a.any()

# 711 µs ± 192 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
# 740 µs ± 1.38 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
# 1.69 ms ± 587 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
# 1.71 ms ± 1.31 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
# 1.71 ms ± 2.05 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
bproxauf
  • 1,076
  • 12
  • 23
  • 1
    I get different from what you got. ( Python 3.9.13, 1.23.0) 617 µs ± 270 ns per loop (mean ± std. dev. of 7 runs, 1,000 loops each) 624 µs ± 1.16 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each) 254 µs ± 702 ns per loop (mean ± std. dev. of 7 runs, 1,000 loops each) 262 µs ± 655 ns per loop (mean ± std. dev. of 7 runs, 1,000 loops each) 262 µs ± 714 ns per loop (mean ± std. dev. of 7 runs, 1,000 loops each) –  Jul 15 '22 at 08:54
  • Hmm, I'm using Python 3.7.6, numpy 1.21.2. No idea... – bproxauf Jul 15 '22 at 09:58
  • 1
    Note that in version 1.23, we improve basic reduction functions including np.all and np.any (see https://github.com/numpy/numpy/pull/21001) though the effect should be small for np.any and np.all. Updating Numpy might help a bit. – Jérôme Richard Jul 15 '22 at 10:12
  • 1
    @Murali This is surprising. Are you running on Windows? AFAIK the Windows build often behave differently (and surprisingly). What is your processor architecture? – Jérôme Richard Jul 15 '22 at 10:13
  • 2
    @JérômeRichard I am using Mac Os(v 12.4), arm architecture (M1 silicon). –  Jul 15 '22 at 11:23
  • 1
    An off topic hint: if you know that your array is positive (probably not your case) `a.sum()==0` is faster. – Salvatore Daniele Bianco Jul 15 '22 at 14:35
  • 1
    @SalvatoreDanieleBianco: unfortunately in general, `a` is a complex-valued array, but for positive real arrays, your tip will indeed provide a speed-up – bproxauf Jul 18 '22 at 06:42

1 Answers1

2

The problem is due to the first two operations being vectorized using SIMD instructions while the three last are not. More specifically, the three last calls do an implicit conversion to bool (_aligned_contig_cast_double_to_bool) which is not yet vectorized. This is a known issue and I have already proposed a pull request for this (which revealed some unexpected issues due to undefined behaviors now fixed). If everything is fine, it should be available in the next major release of Numpy.

Note that a.any() and not a.any() implicitly perform a cast to an array of boolean so to then perform the any operation faster. This is not very efficient, but this is done that way so to reduce the number of generated function variants (Numpy is written in C and so a different implementation has to be generated for each type and optimizing many variants is hard so we prefer so perform implicit casts here, not to mention that this also reduce the size of the generated binaries). If this is not enough, not you can use Cython so to generate a faster specific optimized code.

Jérôme Richard
  • 41,678
  • 6
  • 29
  • 59