10

I have a function that, given a numpy array of xy coordinates, it filters those which lies within a box of side L

import numpy as np
from numba import njit

np.random.seed(65238758)

L = 10
N = 1000
xy = np.random.uniform(0, 50, (N, 2))
box = np.array([
    [0,0],  # lower-left
    [L,L]  # upper-right
]) 

def sinjit(xy, box):
    mask = np.all(np.logical_and(xy >= box[0], xy <= box[1]), axis=1)
    return xy[mask]

If I run this function it returns the correct result:

sinjit(xy, box)

Output: array([[5.53200522, 7.86890708],
       [4.60188554, 9.15249881],
       [9.072563  , 5.6874726 ],
       [4.48976127, 8.73258166],
       ...
       [6.29683131, 5.34225758],
       [2.68057087, 5.09835442],
       [5.98608603, 4.87845464],
       [2.42049857, 6.34739079],
       [4.28586677, 5.79125413]])

But, as I want to speed this task in a loop by using numba, there exists a compatibility problem with "axis" argument in np.all function (it is not implemented in nopython mode). So, my question is, is it possible to avoid such argument in any way? any workaround?

Jacques Gaudin
  • 15,779
  • 10
  • 54
  • 75
Dani
  • 473
  • 3
  • 21
  • But why do you want numba here if your code is already vectorised presumably efficient? – yatu Apr 19 '20 at 13:16
  • Because this is only one iteration, I want to put it in a for loop to build up many cells with diferent domain ranges. And loops are quite inefficient in pure Python – Dani Apr 19 '20 at 13:20
  • I see so you'll be iterating over many `xy` arrays? – yatu Apr 19 '20 at 13:22
  • I want to iterate over many boxes. The main idea is that I have a principal 2D box of edge L with many particles inside, so I want to partition the domain into smaller cells – Dani Apr 19 '20 at 13:25
  • Apart from that, all the code goes into a compiled class with "jitclass", then the use of "numba" is mandatory – Dani Apr 19 '20 at 13:28

3 Answers3

7

I really, really, really wish numba supported optional keyword arguments. Until it does, I'm pretty much ignoring it. However, some hacks are possible here.

You'll need to take extra care for anything that has other than 2 dimensions or has lengths that might be zero.

import numpy as np
from numba import njit

@njit(cache=True)
def np_all_axis0(x):
    """Numba compatible version of np.all(x, axis=0)."""
    out = np.ones(x.shape[1], dtype=np.bool8)
    for i in range(x.shape[0]):
        out = np.logical_and(out, x[i, :])
    return out

@njit(cache=True)
def np_all_axis1(x):
    """Numba compatible version of np.all(x, axis=1)."""
    out = np.ones(x.shape[0], dtype=np.bool8)
    for i in range(x.shape[1]):
        out = np.logical_and(out, x[:, i])
    return out

@njit(cache=True)
def np_any_axis0(x):
    """Numba compatible version of np.any(x, axis=0)."""
    out = np.zeros(x.shape[1], dtype=np.bool8)
    for i in range(x.shape[0]):
        out = np.logical_or(out, x[i, :])
    return out

@njit(cache=True)
def np_any_axis1(x):
    """Numba compatible version of np.any(x, axis=1)."""
    out = np.zeros(x.shape[0], dtype=np.bool8)
    for i in range(x.shape[1]):
        out = np.logical_or(out, x[:, i])
    return out

if __name__ == '__main__':
    x = np.array([[1, 1, 0, 0], [1, 0, 1, 0]], dtype=bool)
    np.testing.assert_array_equal(np.all(x, axis=0), np_all_axis0(x))
    np.testing.assert_array_equal(np.all(x, axis=1), np_all_axis1(x))
    np.testing.assert_array_equal(np.any(x, axis=0), np_any_axis0(x))
    np.testing.assert_array_equal(np.any(x, axis=1), np_any_axis1(x))

I'm not sure how performant this will be, but if you really need to call the function in a higher level jit'ed function, then this will let you do it.

DStauffman
  • 3,960
  • 2
  • 21
  • 30
0

No optional flag for numpy.all() is supported by numba: https://numba.pydata.org/numba-doc/dev/reference/numpysupported.html

If you insist to use numba, the only way is to code in one another way.

0

I encountered the same issue and decided to do some experimenting. I discovered that, if the axis axis can be an integer literal (i.e. it is known ahead of time and doesn't need to be retrieved from a variable), there is a Numba-compatible alternative. That being said, I also found these solutions to be slower with JIT compilation in my test function, so be sure to benchmark your function to make sure there is an actual net improvement if you want to use this.

As others have pointed out, Numba doesn't support the axis argument of several NumPy functions including np.all. The first potential solution I thought of is np.amin (aka np.ndarray.min): np.all(a, axis=axis) is identical to np.amin(a, axis=axis) for boolean arrays, and identical to np.amin(a, axis=axis).astype('bool') for numerical arrays. Unfortunately, np.amin is also in the list of functions for which the axis argument is not supported. However, np.argmin does support the axis argument, and so does np.take_along_axis.

Therefore, np.all(a, axis=axis) can be replaced with

For numeric arrays:

np.take_along_axis(a, np.expand_dims(np.argmin(a, axis=axis), axis), axis)[ (:, ){axis} 0].astype('bool')

For boolean arrays:

np.take_along_axis(a, np.expand_dims(np.argmin(a.astype('int64'), axis=axis), axis), axis)[ (:, ){axis} 0]

The separated parts, (:, ){axis}, should be replaced with axis repetitions of :, so that the correct axis is eliminated. For example, if a is a boolean array and axis is 2, you would use

np.take_along_axis(a, np.expand_dims(np.argmin(a.astype('int64'), axis=2), 2), 2)[:, :, 0].

Benchmarks

All I can say about this is, if you really need a numpy.all alternative within a function that overall would highly benefit from JIT compilation, this solution is suitable. If you're really just looking to speed up all by itself, you won't have much luck.

test.py

import numba
import numpy as np


# @numba.njit  # raises a TypingError
def using_all():
    n = np.arange(10000).reshape((-1, 5))  # numeric array
    b = n < 4888  # boolean array
    return (np.all(n, axis=1),
            np.all(b, axis=1))


# @numba.njit  # raises a TypingError
def using_amin():
    n = np.arange(10000).reshape((-1, 5))  # numeric array
    b = n < 4888  # boolean array
    return (np.amin(n, axis=1).astype('bool'),
            np.amin(b, axis=1))


@numba.njit  # doesn't raise a TypingError
def using_take_along_axis():
    n = np.arange(10000).reshape((-1, 5))  # numeric array
    b = n < 4888  # boolean array
    return (np.take_along_axis(n, np.expand_dims(np.argmin(n, axis=1), 1), 1)[:, 0].astype('bool'),
            np.take_along_axis(b, np.expand_dims(np.argmin(b.astype('int64'), axis=1), 1), 1)[:, 0])


if __name__ == '__main__':
    a = using_all()
    m = using_amin()
    assert np.all(a[0] == m[0])
    assert np.all(a[1] == m[1])
    t = using_take_along_axis()
    assert np.all(a[0] == t[0])
    assert np.all(a[1] == t[1])
PS C:\> python -m timeit -n 10000 -s 'from test import using_all; using_all()' 'using_all()'           
10000 loops, best of 5: 32.9 usec per loop
PS C:\> python -m timeit -n 10000 -s 'from test import using_amin; using_amin()' 'using_amin()'
10000 loops, best of 5: 43.5 usec per loop
PS C:\> python -m timeit -n 10000 -s 'from test import using_take_along_axis; using_take_along_axis()' 'using_take_along_axis()'
10000 loops, best of 5: 55.4 usec per loop
QuaternionsRock
  • 832
  • 7
  • 14