I have an array of 3 dimensional with shape (height, width, 3) representing a BGR image with values being floats between 0 and 1 (inclusive).
I need to find both the minimum and maximum channel of each pixel in one loop. If img
is the array, then the effect would be the same as np.dstack([img.max(axis=-1), img.min(axis=-1)])
, and I wish to do it in one loop.
I have already tried to implement it myself, and I have beaten NumPy builtin methods by far, but the efficiency is only because Numba. I think perhaps there can be more efficient ways to do this.
My idea is simple, with three numbers a, b, c, there are 6 possible relationships:
a <= b <= c
a <= c <= b
b <= a <= c
b < c < a
c < a < b
c < b < a
With 3 conditional variable swaps I have managed to find minimum and maximum in one function without calling min
and max
:
def extrema(a, b, c):
if b > c:
b, c = c, b
if a > b:
a, b = b, a
if b > c:
b, c = c, b
return a, c
My method is faster than min
, max
calls:
In [9]: %timeit extrema(29, 500, 12)
235 ns ± 14.9 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
In [10]: %timeit max((29, 500, 12))
186 ns ± 7.26 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
In [11]: %timeit min((29, 500, 12))
186 ns ± 5.74 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
So I have implemented the following functions for benchmarking:
import numba as nb
import numpy as np
def extrema_img(img: np.ndarray) -> np.ndarray:
ext = img.copy()
for a, b in ((1, 2), (0, 1), (1, 2)):
temp = ext[..., a][mask := ext[..., a] > ext[..., b]]
ext[mask, a] = ext[mask, b]
ext[mask, b] = temp
return ext
def extrema_img_1(img: np.ndarray) -> np.ndarray:
(h, w) = img.shape[:2]
ext = np.zeros((h, w, 2))
for y in range(h):
for x in range(w):
a, b, c = img[y, x]
if b > c:
b, c = c, b
if a > b:
a, b = b, a
if b > c:
b, c = c, b
ext[y, x] = (a, c)
return ext
@nb.njit(cache=True, fastmath=True)
def extrema_img_nb1(img: np.ndarray) -> np.ndarray:
(h, w) = img.shape[:2]
ext = np.zeros((h, w, 2))
for y in range(h):
for x in range(w):
a, b, c = img[y, x]
if b > c:
b, c = c, b
if a > b:
a, b = b, a
if b > c:
b, c = c, b
ext[y, x] = (a, c)
return ext
@nb.njit(cache=True, fastmath=True)
def extrema_img_nb3(img: np.ndarray) -> np.ndarray:
(h, w) = img.shape[:2]
ext = np.zeros((h, w, 2))
for y in range(h):
for x in range(w):
ext[y, x] = np.sort(img[y, x])[:3:2]
return ext
def extrema_img_7(img: np.ndarray) -> np.ndarray:
return np.sort(img, axis=-1)
def extrema_img_8(img: np.ndarray) -> tuple:
return img.min(axis=-1), img.max(axis=-1)
With img = np.random.random((1080, 1920, 3))
, the benchmark results are below:
In [2]: %timeit extrema_img(img)
338 ms ± 18.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [3]: %timeit extrema_img_1(img)
3.88 s ± 55.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [4]: %timeit extrema_img_nb1(img)
18.6 ms ± 788 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [5]: %timeit extrema_img_nb1(img)
19.6 ms ± 114 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [6]: %timeit extrema_img_nb3(img)
1.67 s ± 26.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [7]: %timeit extrema_img_7(img)
89.3 ms ± 3.9 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [8]: %timeit extrema_img_8(img)
184 ms ± 1.82 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
extrema_img_nb1
is the fastest, but extrema_img_1
is much slower than the boolean masking based solution, I have tried to compile the masking based solution but it doesn't work, I guess I should post a separate question.
How can it be faster?