0

I have recently written a script to convert BGR arrays of [0, 1] floats to HSL and back. I posted it on Code Review. There is currently one answer but it doesn't improve performance.

I have benchmarked my code against cv2.cvtColor and found my code to be inefficient, so I want to compile the code with Numba to make it run faster.

I have tried to wrapping every function with @nb.njit(cache=True, fastmath=True), and this doesn't work.

So I have tested every NumPy syntax and NumPy functions I have used individually, and found two functions that don't work with Numba.

I need to find the maximum channel of each pixel (np.max(img, axis=-1)) and minimum channel of each pixel (np.max(img, axis=-1)), and the axis argument doesn't work with Numba.

I have tried to Google search this but the only thing even remotely relevant I found is this, but it only implements np.any and np.all, and only works for two dimensional arrays whereas here the arrays are three-dimensional.

I can write a for loop based solution but I won't write it, because it is bound to be inefficient and against the purpose of using NumPy and Numba in the first place.

Minimal reproducible example:

import numba as nb
import numpy as np

@nb.njit(cache=True, fastmath=True)
def max_per_cell(arr):
    return np.max(arr, axis=-1)

@nb.njit(cache=True, fastmath=True)
def min_per_cell(arr):
    return np.min(arr, axis=-1)

img = np.random.random((3, 4, 3))
max_per_cell(img)
min_per_cell(img)

Exception:

In [2]: max_per_cell(img)
---------------------------------------------------------------------------
TypingError                               Traceback (most recent call last)
Cell In[2], line 1
----> 1 max_per_cell(img)

File C:\Python310\lib\site-packages\numba\core\dispatcher.py:468, in _DispatcherBase._compile_for_args(self, *args, **kws)
    464         msg = (f"{str(e).rstrip()} \n\nThis error may have been caused "
    465                f"by the following argument(s):\n{args_str}\n")
    466         e.patch_message(msg)
--> 468     error_rewrite(e, 'typing')
    469 except errors.UnsupportedError as e:
    470     # Something unsupported is present in the user code, add help info
    471     error_rewrite(e, 'unsupported_error')

File C:\Python310\lib\site-packages\numba\core\dispatcher.py:409, in _DispatcherBase._compile_for_args.<locals>.error_rewrite(e, issue_type)
    407     raise e
    408 else:
--> 409     raise e.with_traceback(None)

TypingError: Failed in nopython mode pipeline (step: nopython frontend)
No implementation of function Function(<function amax at 0x0000014E306D3370>) found for signature:

 >>> amax(array(float64, 3d, C), axis=Literal[int](-1))

There are 2 candidate implementations:
  - Of which 2 did not match due to:
  Overload in function 'npy_max': File: numba\np\arraymath.py: Line 541.
    With argument(s): '(array(float64, 3d, C), axis=int64)':
   Rejected as the implementation raised a specific error:
     TypingError: got an unexpected keyword argument 'axis'
  raised from C:\Python310\lib\site-packages\numba\core\typing\templates.py:784

During: resolving callee type: Function(<function amax at 0x0000014E306D3370>)
During: typing of call at <ipython-input-1-b3894b8b12b8> (10)


File "<ipython-input-1-b3894b8b12b8>", line 10:
def max_per_cell(arr):
    return np.max(arr, axis=-1)
    ^

How to fix this?

jared
  • 4,165
  • 1
  • 8
  • 31
Ξένη Γήινος
  • 2,181
  • 1
  • 9
  • 35
  • Well, as the link you provided mention : this is not yet supported by Numba. If you wish to add the support in Numba that would be great since many Numba users would like such a feature. As of now, the solution is provided by the link you mention. You just need to add a loop for the 3D case and use another function in the loop body... If this post is a request for asking new features not yet supported in Numba then GitLab issues are made for that. – Jérôme Richard Sep 01 '23 at 17:19
  • 1
    *"it is bound to be inefficient"* This statement is actually wrong. Numba generally uses basic loops internally. This is also why some Numpy function using Numba are slower than Numpy alone (`np.where` for example). Numpy developers sometimes use (SIMD) intrinsics while Numba does not. While Numba developers could do that, this is currently a pain to do (so far). – Jérôme Richard Sep 01 '23 at 17:23

1 Answers1

3

It's reasonably straightforward to implement this without np.max(), using loops instead:

@nb.njit()
def max_per_cell_nb(arr):
    ret = np.empty(arr.shape[:-1], dtype=arr.dtype)
    n, m = ret.shape
    for i in range(n):
        for j in range(m):
            max_ = arr[i, j, 0]
            max_ = max(max_, arr[i, j, 1])
            max_ = max(max_, arr[i, j, 2])
            ret[i, j] = max_
    return ret

Benchmarking it, it turns out to be about 16x faster than np.max(arr, axis=-1).

%timeit max_per_cell_nb(img)
4.88 ms ± 163 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit max_per_cell(img)
81 ms ± 654 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

While benchmarking this, I made the following assumptions:

  • The image is 1920x1080x3. (In other words, it's a big image.)
  • The image array is in C order rather than Fortran order. If it's in Fortran order, the speed of my method drops to 7ms, and the speed of np.max() gets faster and only takes 15 ms. See Check if numpy array is contiguous? for how to tell if your array is in C or Fortran order. Your example of np.random.random((3, 4, 3)) is C contiguous.
  • I'm comparing this function to np.max(arr, axis=-1) with Numba JIT turned off, because it can't really optimize single calls to NumPy functions.
Nick ODell
  • 15,465
  • 3
  • 32
  • 66
  • 1
    Good. Note that the speed up is mainly due to the fact the the last dimension is small and Numpy is very inefficient in that case (due to an internal generic iterator meant to support many features like broadcasting). The problem is that the input layout is inefficient in the first place. 1920x1080x3 should be replaced by the 3x1920x1080 layout (like OpenCV does AFAIK). With the later, compilers (and so Numpy and Numba) can generate a significantly faster code using SIMD instructions (30~60x times faster when the code is not memory bound). – Jérôme Richard Sep 01 '23 at 21:18