I try to evalaute the performance of numpy linked to ATLAS compared to numpy linked to OpenBLAS. I get some strange results for ATLAS which I describe below.
The Python code for evaluating matrix-matrix multiplication (aka sgemm) looks like this:
import sys
sys.path.insert(0, "numpy-1.8.1")
import numpy
import timeit
for i in range(100, 501, 100):
setup = "import numpy; m1 = numpy.random.rand(%d, %d).astype(numpy.float32)" % (i, i)
timer = timeit.Timer("numpy.dot(m1, m1)", setup)
times = timer.repeat(100, 1)
print "%3d" % i,
print "%7.4f" % numpy.mean(times),
print "%7.4f" % numpy.min(times),
print "%7.4f" % numpy.max(times)
If I run this script with numpy linked to ATLAS I get large variations in the measured time. You see the matrix size in the frist column, followed by mean, min and max of execution times gained by running the matrix matrix multiplication 100 fold:
100 0.0003 0.0003 0.0004
200 0.0023 0.0010 0.0073
300 0.0052 0.0026 0.0178
400 0.0148 0.0066 0.0283
500 0.0295 0.0169 0.0531
If I repeat this procedure with numpy linked to OpenBLAS using one thread the running times are much more stable:
100 0.0002 0.0002 0.0003
200 0.0014 0.0014 0.0015
300 0.0044 0.0044 0.0047
400 0.0102 0.0101 0.0105
500 0.0169 0.0168 0.0177
Can anybody explane this observation ?
Edit: Additional information:
The oberved min and max values for ATLAS are no outliers, the times are distributed over the given range.
I uploaded ATALS times for i=500 at https://gist.github.com/uweschmitt/768bd165477d7c14095e
The given times come from a different run, so avg, min and max values differ slightly.
Edit: Additional finding:
May CPU Throttling (http://www.scipy.org/scipylib/building/linux.html#step-1-disable-cpu-throttling) be the cause ? I do not know enough about CPU throtting in order to judge its impact on my measurements. Regrettably I can not set / unset it on my target machine.