I have an array of true/false values with a very simple structure:
# the real array has hundreds of thousands of items
positions = np.array([True, False, False, False, True, True, True, True, False, False, False], dtype=np.bool)
I want to traverse this array and output the places where changes happen (true becomes false or the contrary). For this purpose, I've put together two different approaches:
- a recursive binary search (see if all values are the same, if not, split in two, then recurse)
- a purely iterative search (loop through all elements and compare with the previous/next one)
Both versions give exactly the result that I want, however Numba has a greater effect on one than another. With a dummy array of 300k values, here are the performance results:
Performance results with array of 300k elements
- pure Python binary-search runs in 11 ms
- pure Python iterative-search runs in 1.1 s (100x slower than binary-search)
- Numba binary-search runs in 5 ms (2 times faster than pure Python equivalent)
- Numba iterative-search runs in 900 µs (1,200 times faster than pure Python equivalent)
As a result, when using Numba, binary_search is 5x slower than iterative_search, while in theory it should be 100x faster (it should be expected to run in 9 µs if it was properly accelerated).
What can be done to make Numba accelerate binary-search as much as it accelerates iterative-search?
Code for both approaches (along with a sample position
array) is available on this public gist: https://gist.github.com/JivanRoquet/d58989aa0a4598e060ec2c705b9f3d8f
Note: Numba is not running binary_search()
in object mode, because when mentioning nopython=True
, it doesn't complain and happily compiles the function.