I've been looking for ways to easily multithread some of my simple analysis code since I had noticed numpy it was only using one core, despite the fact that it is supposed to be multithreaded.
I know that numpy is configured for multiple cores, since I can see tests using numpy.dot use all my cores, so I just reimplemented mean as a dot product, and it runs way faster. Is there some reason mean can't run this fast on its own? I find similar behavior for larger arrays, although the ratio is close to 2 than the 3 shown in my example.
I've been reading a bunch of posts on similar numpy speed issues, and apparently its way more complicated than I would have thought. Any insight would be helpful, I'd prefer to just use mean since it's more readable and less code, but I might switch to dot based means.
In [27]: data = numpy.random.rand(10,10)
In [28]: a = numpy.ones(10)
In [29]: %timeit numpy.dot(data,a)/10.0
100000 loops, best of 3: 4.8 us per loop
In [30]: %timeit numpy.mean(data,axis=1)
100000 loops, best of 3: 14.8 us per loop
In [31]: numpy.dot(data,a)/10.0 - numpy.mean(data,axis=1)
Out[31]:
array([ 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
0.00000000e+00, 1.11022302e-16, 0.00000000e+00,
0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
-1.11022302e-16])