Why is this code in Python so much faster than in C++?

Question

In Python I have the following simple code:

N = 10000
mu = 0.0001
iterations = 10000
l = 10

@nb.njit()
def func1(N, l, mu, iterations):
    intList = [0]*N
    for x in range(iterations):
        for number in range(N):
            for position in range(l):
                if random.random() < mu:
                    intList[number] = intList[number] ^ (1 << position)

func1(N, l, mu, iterations)

count = 1
print(timeit(lambda: func1(N, l, mu, iterations), number=count))
>>> 5.677

I'm not used to C++ but wanted to see, how quick it would be compared to the Python version. Since my Python code is quite simple I thought I could give it a try. My C++ code that should be equivalent to the Python code is

#include <iostream>
#include <random>

using namespace std; 

int func1(int iterations, int l, int N, float mu)
{
    std::random_device rd;  //Will be used to obtain a seed for the random number engine
    std::mt19937 gen(rd()); //Standard mersenne_twister_engine seeded with rd()
    std::uniform_real_distribution<> dis(0.0, 1.0);
    std::vector<int> intList(N);

    //for (int i = 0; i < N; i++)
    //  cout << intList[i];
    //cout << "\n";
    int x, number, position;
    for (x = 0; x < iterations; x++) {
        for (number = 0; number < N; number++) {
            for (position = 0; position < l; position++) {
                if (dis(gen) < mu) {
                    intList[number] = intList[number] ^ (1 << position);
                }
            }
        }
    }
    //for (int i = 0; i < N; i++)
    //  cout << intList[i];
    return 0;
}

int main()
{   
    int N, l, iterations;
    float mu;
    N = 10000;
    mu = 0.0001;
    iterations = 10000;
    l = 10;
    func1(iterations, l, N, mu);
    cout << "\nFertig";
}

But this code takes up to 5-10 times longer. I'm really surprised by that. What is the explanation for that?

how do you compile it? optimizations turned on? And how did you measure? — 463035818_is_not_an_ai, Nov 18 '20 at 20:45
I don't know anything about `random.random`, but I'm willing to bet that `std::mt19937` is a slower but much better RNG engine. — super, Nov 18 '20 at 20:47
I used visualstudio and clicked "start without debugging" and used my watch :D. I'm a completely newbie to c++. — HighwayJohn, Nov 18 '20 at 20:49
"start without debugging" only means that you don't attach the debugger. It has no effect on how the program was build (which by default should be with optimizations turned off) — UnholySheep, Nov 18 '20 at 20:51
it means VS does not attach a debugger. The configuration of a freshly created VS solution will be Debug by default. — jwezorek, Nov 18 '20 at 20:53
I see. Let me do it correctly and check whether it is faster. — HighwayJohn, Nov 18 '20 at 20:55
It appears [Python random module uses Mersenne Twister 19937](https://docs.python.org/3.8/library/random.html). Strictly speaking, it's not `std::mt19937`, but a C implementation. — Fred Larson, Nov 18 '20 at 20:57
The python `timeit` command isn't printing elapsed time there - for me it prints out 5.88 even though Linux `time` for the python process shows 12.10s elapsed time. The C++ version runs in about 10.2s. The differences will be _much_ greater when most of the work isn't being done by optimised C functions anyway.... — Tony Delroy, Nov 18 '20 at 21:09
@FredLarson: And unlike the OP's code, it's seeded properly (not really the OP's fault; C++ has made is painful as possible to properly seed `std::mt19937`, while Python automatically fully seeds from system random under the hood without user intervention). — ShadowRanger, Nov 18 '20 at 21:10
@TonyDelroy: `timeit` times the function calls themselves (and generally correctly); Linux `time` is including the start up time, the `numba.njit` time, and the overhead of the timing loop itself (which is excluded from `timeit`'s reported time). The importing of `numba` and `njit`ing of a complicated function can take a few seconds all by itself. Of course, the use of `numba.njit` means this isn't really comparing "Python" to C++. It's comparing a somewhat restricted subset of Python that can be statically analyzed and converted to compiled code to C++. — ShadowRanger, Nov 18 '20 at 21:11
Thanks for the comments! It is interesting to see that it seems to be okay to use Python, even though performance is important. How should I seed it right? @ShadowRanger — HighwayJohn, Nov 18 '20 at 21:25
@HighwayJohn: [Fully seeding the C++ PRNGs is too hard to put in a comment](https://stackoverflow.com/q/45069219/364696). :-) — ShadowRanger, Nov 18 '20 at 21:26
After a week I came back to this problem and I still notice, that in VS the code is much slower. Does anyone know how to configure VS correctly? — HighwayJohn, Nov 27 '20 at 18:20

orlp · Accepted Answer · 2020-11-19T02:17:08.243

Numba internally translates random.random calls into its own inlined internal Mersenne Twister implementation. So effectively the entire func1 gets compiled down to efficient code by LLVM. It might as well have been written in C++ like the other implementation.

And so it's no surprise to me, that when I compile your C++ implementation with optimizations turned on, I can not reproduce your issue. Both implementation are essentially the same. On my machine the Python code runs in ~6.1 seconds and the C++ code in ~6.9 seconds.

If you wish to go faster however, note that if you wish to efficiently implement genetic mutation with low mutation probability (which it appears you are), you're better off first generating the Binomial distribution with probability μ, and then selecting that many indices without replacement from your genome length. Alternatively the method I describe here.

Good catch. I'm actually already thinking about that. See my post in https://codereview.stackexchange.com/questions/252099/random-changes-in-a-list-of-binary-numbers — HighwayJohn, Nov 18 '20 at 21:50

Why is this code in Python so much faster than in C++?

1 Answers1