2

I'm new to using Numba and keen to try and understand how it works.

I built a binary search function:

def binarysearch(alist, item):
        first = 0
        last = len(alist) - 1
        found = False

        while first<=last:
            midpoint = (first + last)//2            
            if alist[midpoint] == item:
                return midpoint
            else:
                if item < alist[midpoint]:
                    last = midpoint-1
                else:
                    first = midpoint+1  
        return -1

And if I run this as follows:

l = list(range(10000000))

%timeit
binarysearch(l,5000)

I get: 4.88 µs ± 315 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

If I do the same but use @numba.njit, I get: 19.8 s ± 533 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Any ideas for how to get the improvements of Numba?

versions: Python: 3.6.3 Numba: 0.40.0

Xander
  • 21
  • 1
  • 1
    That’s interesting. Have you tried a typed array instead of a list? Also maybe profile it by having it search for random numbers instead of always the same one. Could you count the number of iterations? 19s seems awfully slow for numba. – AlexNe Jul 18 '20 at 14:34
  • 1
    Just tried: ``` import random from array import array l = array('i',range(10000000)) x = random.randint(0,10000000) ``` and got a time of: ```531 ns ± 12.2 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)``` and for the raw python: ```5.57 µs ± 282 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)``` So that seems to have worked... – Xander Jul 18 '20 at 15:02
  • I’m glad I was able to help. – AlexNe Jul 18 '20 at 15:09
  • Any idea why lists don't work and array's do? Even with typing? – Xander Jul 18 '20 at 15:10
  • 1
    Not really. In Python, list items can be of arbitrary type and size, it might be interesting on reading up on those: https://stackoverflow.com/questions/24171491/binary-layout-of-python-lists. Numba might have difficulties optimizing that. OTOH the memory address of each item in a numpy array/typed array/etc. is know before the loop even starts, so I’d assume this makes for way less overhead. – AlexNe Jul 18 '20 at 15:18

0 Answers0