Python optimization with numpy min, max (or numba)

Question

I need to get performance optimization extremely with python, numpy.

My data is like this,

a1 = np.array(np.random.random(500000) * 1000)
a2 = np.array(np.random.random(500000) * 5000)

With different ndarray a1, a2, I want to calculate min-max gap.

numpy:

np.max(a1) - np.min(a2)

numba:

@nb.jit(nb.float64(nb.float64, nb.float64), cache=True, fastmath=True)
def nb_max_min(s1, s2):
    return np.max(s1) - np.min(s2)

But, I got disappointed result

min-max(numba): 1.574092000000249 ms
max-max(numpy): 1.4246419999999205 ms

I want to make more fast calc within ~0.xx ms if possible. How to conquer this optimization?

update

I only measured max - min part. My timing code is here.

import time


def timing(label, fn):
    t0 = time.perf_counter()
    fn()
    t1 = time.perf_counter()
    print('{}: {} ms'.format(label, (t1 - t0) * 1000))

All my code here,

@nb.jit(nb.float64(nb.float64, nb.float64), cache=True, fastmath=True)
def nb_max_min(s1, s2):
    return np.max(s1) - np.min(s2)


a1 = np.random.random(periods) * 2000
a2 = np.random.random(periods) * 1000
timing('nb_min_max', lambda: nb_max_min(a1, a2))
timing('nb_min_max', lambda: nb_max_min(a1, a2))
timing('nb_min_max', lambda: nb_max_min(a1, a2))
timing('max-max', lambda: np.max(a1) - np.min(a2))
timing('max-max', lambda: np.max(a1) - np.min(a2))
timing('max-max', lambda: np.max(a1) - np.min(a2))

And, this is result

nb_min_max: 0.728947999999896 ms
nb_min_max: 1.0030130000000526 ms
nb_min_max: 1.3124690000001493 ms
max-max: 1.662436000000156 ms
max-max: 0.9315169999997153 ms
max-max: 1.9570019999992638 ms

also I tried timeit

%timeit np.max(a1) - np.min(a2)

475 µs ± 9.72 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

I think this is the most fastest way with python. Numpy or numba's result is not different significantly. As user2699 commented, fortran is the last chance to optimize..

Your operation `np.max(a1) - np.min(a2)` is already vectorized, I doubt you can make it significantly faster — Brenlla, Jul 25 '18 at 15:50
My maximum row count might be under 600,000.. That problem is the most slow part... sad — bsdo64, Jul 25 '18 at 15:54
it is not clear if you meassured only the max - min part or also the creation of the arrays — juvian, Jul 25 '18 at 16:22
Possible duplicate of [numpy: function for simultaneous max() and min()](https://stackoverflow.com/questions/12200580/numpy-function-for-simultaneous-max-and-min) — user2699, Jul 25 '18 at 19:59
@juvian I measured only max-min part. I'm using macbook pro touchbar 15. I think it is very slow computer now.. — bsdo64, Jul 26 '18 at 00:52

score 1 · Answer 1 · answered Jul 25 '18 at 16:53

Using the '%timeit' magic in ipython, I got the following results:

Array generation:

%%timeit
a1 = np.array(np.random.random(500000) * 1000)
a2 = np.array(np.random.random(500000) * 5000)
% 23.3 ms

min-max gap:

%%timeit
np.max(a1) - np.min(a2)
% 444 µs

I think this is already very fast, maybe you measured some additional overhead, like @juvian suggested?

score 1 · Accepted Answer · answered Jul 25 '18 at 21:38

How do you get these realy slow timings?

Code

import numba as nb
import numpy as np
import time

a1 = np.array(np.random.random(500000) * 1000)
a2 = np.array(np.random.random(500000) * 5000)

@nb.jit(nb.float64(nb.float64[:], nb.float64[:]),parallel=True,fastmath=True)
def nb_max_min(s1, s2):
    return np.max(s1) - np.min(s2)

def np_max_min(s1,s2):
  return np.max(s1) - np.min(s2)

t1=time.time()
for i in range(10000):
  res_1=np_max_min(a1, a2)

print(str((time.time()-t1)/10)+ ' ms')

t1=time.time()
for i in range(10000):
  res_2=nb_max_min(a1, a2)

print(str((time.time()-t1)/10)+ ' ms')
np.allclose(res_1,res_2)

Results

Numpy: 0.298ms (=26.8 GB/s)
Numba: 0.243ms (=33 GB/s)

Discussion

This simple operations are memory limited. The max. memory throughput of my Core i7-4th gen is 25,6GB/s. Numba even significantly exceeds the memory bandwith, because of cache-effects (The problems fits more or less in the L3-cache).The timings in real code may be lower because the input arrays may not be already in L3-cache.

Because my macbook pro touchbar uses i7-6700hq.. This is for mobile processor. Slow more than my 3-years desktop pc which i5-6600. — bsdo64, Jul 26 '18 at 01:20

Python optimization with numpy min, max (or numba)

update

2 Answers2