All the code was run on the same machine on linux.
In python:
import numpy as np
drr = abs(np.random.randn(100000,50))
%timeit np.log2(drr)
10 loops, best of 3: 77.9 ms per loop
In C++ (compiled with g++ -o log ./log.cpp -std=c++11 -O3):
#include <iostream>
#include <iomanip>
#include <string>
#include <map>
#include <random>
#include <ctime>
int main()
{
std::mt19937 e2(0);
std::normal_distribution<> dist(0, 1);
const int n_seq = 100000;
const int l_seq = 50;
static double x[n_seq][l_seq];
for (int n = 0;n < n_seq; ++n) {
for (int k = 0; k < l_seq; ++k) {
x[n][k] = abs(dist(e2));
if(x[n][k] <= 0)
x[n][k] = 0.1;
}
}
clock_t begin = clock();
for (int n = 0; n < n_seq; ++n) {
for (int k = 0; k < l_seq; ++k) {
x[n][k] = std::log2(x[n][k]);
}
}
clock_t end = clock();
Runs in 60 ms
In MATLAB:
abr = abs(randn(100000,50));
tic;abr=log2(abr);toc
Elapsed time is 7.8 ms.
I can understand the speed difference between C++ and numpy, but MATLAB beats everything. I've come across http://fastapprox.googlecode.com/svn/trunk/fastapprox/src/fastonebigheader.h but this does only float, not double, and I'm not sure how to convert it to double.
I also tried this: http://hackage.haskell.org/package/approximate-0.2.2.1/src/cbits/fast.c which has fast log functions, and when compiled as a numpy ufunc, runs in 20 ms, which is great, but the loss in accuracy is significant.
Any ideas on how to achieve the magical log2 speed that MATLAB gets?
UPDATE
Thank you all for comments, that was very fast and very helpful! Indeed, the answer is parallelisation, i.e. spreading the load on several threads. Following @morningsun suggestion,
%timeit numexpr.evaluate('log(drr)')
gives 5.6 ms, which is on par with MATLAB, thank you! numexpr is MKL enabled