Why deploying Numba and using arrays instead of class gives different results for the same parameters in my program?

Question

So, to summarize I have the exact same parameters for my code but with three different methods:

Defining a class for my list of particles SP with three instances/attributes. (around 1 mins)
Defining SP as an array. (around 1 mins)
Defining SP as an array and using Numba (around 5 seconds)

Consider the first case:

n=1000
mu=np.random.uniform(0,1,n)
r=[sqrt(-2*log(1-i)) for i in mu]
eta=np.random.uniform(0,1,n)
theta=2*pi*eta;
cuz=[cos(i) for i in theta]
suz=[sin(i) for i in theta]
Zinitial=[a*b for a,b in zip(r,cuz)];
Pinitial=[a*b for a,b in zip(r,suz)];
class Particle:
    def __init__(self, pos, mom, spin):
        self.pos = pos
        self.mom = mom
        self.spin = spin
   
SP = sorted([Particle(pos = i, mom = j, spin = choice([1, 0])) for i,j in zip(Zinitial,Pinitial)],key=lambda x:x.pos)
Upi=[];
Downi=[];
count_plot=[];
for j in range(len(SP)):
    if SP[j].spin == 1:
        Upi.append(SP[j].pos)
    else:
        Downi.append(SP[j].pos)
Zavgi=sum(Zinitial)/len(Zinitial)
Zreli=sum(Upi)/len(Upi)-sum(Downi)/len(Downi)
"Observables"
Zavg=[Zavgi];
Zrelm=[Zreli];
T_plot=[0];

"Time"

iter=10**(4);
dt=1/(2*n);
alf=sqrt(n);

"Dynamics"


counter=0;
sum1,sum2=0,0;

for i in range(1,iter+1):
        
        t=i*dt;
        T_plot.append(t) 
        Z=[];
        Up=[];
        Down=[];
        c,s=cos(t),sin(t);
        c1,s1=cos(t-dt),sin(t-dt);
        for j in range(n-1):
            collchk=((c*(SP[j].pos)+s*(SP[j].mom))-(c*(SP[j+1].pos)+s*(SP[j+1].mom)))*(c1*(SP[j].pos)+s1*(SP[j].mom)-(c1*(SP[j+1].pos)+s1*(SP[j+1].mom)));

            prel=((c*(SP[j].mom)-s*(SP[j].pos))-(c*(SP[j+1].mom)-s*(SP[j+1].pos)))/2;
               
            rcoeff=1/(1+(prel*alf)**2);
            rand_value=random();
            
            
            if collchk<0:
               
              
               SP[j], SP[j+1]=SP[j+1],SP[j];
               
              
               if rcoeff>rand_value:
                   counter=counter+1
                   SP[j].spin,SP[j+1].spin=SP[j+1].spin,SP[j].spin;
            if SP[j].spin == 1:
                Up.append(c*(SP[j].pos)+s*(SP[j].mom))
            else:
                Down.append(c*(SP[j].pos)+s*(SP[j].mom))
            Z.append(c*(SP[j].pos)+s*(SP[j].mom))

        
        
        Zrel=sum(Up[0:])/len(Up) - sum(Down[0:])/len(Down);
        Zrelm.append(Zrel)
                        
        Zm=sum(Z)/len(Z)
        Zavg.append(Zm)
   

print("Rate of collision per particle = ",counter/(n*(dt*iter)))

The OUTPUT is: Rate of collision per particle = 0.0722

The second case:

n=1000
mu=np.random.uniform(0,1,n)
r=np.array([sqrt(-2*log(1-i)) for i in mu])
eta=np.random.uniform(0,1,n)
theta=2*pi*eta;
cuz=np.array([cos(i) for i in theta])
suz=np.array([sin(i) for i in theta])
Zinitial=np.multiply(r,cuz);
Pinitial=np.multiply(r,suz);

SP = np.array(sorted(np.array([  np.array([i,j,choice([1,0])]) for i, j in zip(Zinitial, Pinitial)]),
                key=lambda x: x[0]))

Upi=[];
Downi=[];
count_plot=[];
for j in range(len(SP)):
    if SP[j][2] == 1:
        Upi.append(SP[j][0])
    else:
        Downi.append(SP[j][0])
Zavgi=sum(Zinitial)/len(Zinitial)
Zreli=sum(Upi)/len(Upi)-sum(Downi)/len(Downi)
"Observables"
Zavg=[Zavgi];
Zrelm=[Zreli];
T_plot=[0];

"Time"

iter=10**(4);
dt=1/(2*n);
alf=sqrt(n);

"Dynamics"


counter=0;
sum1,sum2=0,0;

for i in range(1,iter+1):
        
        t=i*dt;
        T_plot.append(t) 
        Z=[];
        Up=[];
        Down=[];
        c,s=cos(t),sin(t);
        c1,s1=cos(t-dt),sin(t-dt);
        for j in range(n-1):
            collchk=((c*(SP[j][0])+s*(SP[j][1]))-(c*(SP[j+1][0])+s*(SP[j+1][1])))*(c1*(SP[j][0])+s1*(SP[j][1])-(c1*(SP[j+1][0])+s1*(SP[j+1][1])));

            prel=((c*(SP[j][1])-s*(SP[j][0]))-(c*(SP[j+1][1])-s*(SP[j+1][0])))/2;
               
            rcoeff=1/(1+(prel*alf)**2);
            rand_value=random();
            
            
            if collchk<0:
               
              
               SP[j], SP[j+1]=SP[j+1],SP[j];
               
              
               if rcoeff>rand_value:
                   counter=counter+1
                   SP[j][2],SP[j+1][2]=SP[j+1][2],SP[j][2];
            if SP[j][2] == 1:
                Up.append(c*(SP[j][0])+s*(SP[j][1]))
            else:
                Down.append(c*(SP[j][0])+s*(SP[j][1]))
            Z.append(c*(SP[j][0])+s*(SP[j][1]))

        
        
        Zrel=sum(Up[0:])/len(Up) - sum(Down[0:])/len(Down);
        Zrelm.append(Zrel)
                        
        Zm=sum(Z)/len(Z)
        Zavg.append(Zm)
   

print("Rate of collision per particle = ",counter/(n*(dt*iter)))

The OUTPUT is :

Rate of collision per particle = 0.0134

And the quickest Numba case;

import numpy as np
import matplotlib.pyplot as plt
import random
from math import *
from random import *
from numba import jit


"Dynamics"

@jit(nopython=True)
def f(SP, Zavgi, Zreli, alf, dt, n):
    "Time"
    counter = 0;
    
    Zavg = np.array([Zavgi]);
    Zrelm = np.array([Zreli]);
    T_plot = np.array([0]);
    for i in range(1, iter + 1):

        t = i * dt;
        np.append(T_plot,t)
        Z = [];
        Up = [];
        Down = [];
        c, s = cos(t), sin(t);
        c1, s1 = cos(t - dt), sin(t - dt);
        for j in range(n - 1):
            collchk = ((c * (SP[j][0]) + s * (SP[j][1])) - (c * (SP[j + 1][0]) + s * (SP[j + 1][1]))) * (
                    c1 * (SP[j][0]) + s1 * (SP[j][1]) - (c1 * (SP[j + 1][0]) + s1 * (SP[j + 1][1])));

            prel = ((c * (SP[j][1]) - s * (SP[j][0])) - (c * (SP[j + 1][1]) - s * (SP[j + 1][0]))) / 2;

            rcoeff = 1 / (1 + (prel * alf) ** 2);
            rand_value = random();

            if collchk < 0:

                SP[j], SP[j + 1] = SP[j + 1], SP[j];

                if rcoeff > rand_value:
                    counter = counter + 1
                    SP[j][2], SP[j + 1][2] = SP[j + 1][2], SP[j][2];
            if SP[j][2] == 1:
                Up.append(c * (SP[j][0]) + s * (SP[j][1]))
            else:
                Down.append(c * (SP[j][0]) + s * (SP[j][1]))
            Z.append(c * (SP[j][0]) + s * (SP[j][1]))

        Zrel = np.sum(np.array(Up)) / len(Up) - np.sum(np.array(Down)) / len(Down);
        Zrelm = np.append(Zrelm, Zrel)

        Zm = np.sum(np.array(Z)) / len(Z)
        Zavg = np.append(Zavg, Zm)


    return Zavg, Zrelm, counter, T_plot



if __name__ == '__main__':

    n = 1000
    mu = np.random.uniform(0, 1, n)
    r = [sqrt(-2 * log(1 - i)) for i in mu]
    eta = np.random.uniform(0, 1, n)
    theta = 2 * pi * eta;
    cuz = [cos(i) for i in theta]
    suz = [sin(i) for i in theta]
    Zinitial = [a * b for a, b in zip(r, cuz)];
    Pinitial = [a * b for a, b in zip(r, suz)];

    iter = 10 ** (4);
    dt = 1 / (2 * n);
    alf = sqrt(n);


    SP = np.array(sorted(np.array([  np.array([i,j,choice([1,0])]) for i, j in zip(Zinitial, Pinitial)]),
                key=lambda x: x[0]))
    Upi = [];
    Downi = [];
    count_plot = [];
    for j in range(len(SP)):
        if SP[j][2] == 1:
            Upi.append(SP[j][0])
        else:
            Downi.append(SP[j][0])
    Zavgi = np.sum(Zinitial) / len(Zinitial)
    Zreli = np.sum(Upi) / len(Upi) - np.sum(Downi) / len(Downi)


    Zavg, Zrelm, counter, T_plot = f(SP, Zavgi, Zreli, alf, dt, n)
    print("rate= ", counter/(n*(iter*dt)))

The OUTPUT:

rate= 0.00814

As can be seen, the rates are different for all three cases even when the parameters are the same. The one with the Numba has a rate different by a factor of 10.

Since the parameters are the same I would expect the three different methods to give very close rates.

Why is this happening?

EDIT:

I have shortened the time of the code by simply running it for a much smaller time. Due to this I have also removed the graphs which look super weird for such short time scales. I want to stress that the problem seems to be two fold but related: The rates differ by an order for many runs and the graphs for the array case (with and without Numba) appear more jittered (which becomes really prominent for long time runs) than the class case.

This is certainly due to a numerical instability. Why do you expect the result to be the same while calling `random()` in the middle of the code? Is the same code is deterministic? Besides this, there is an error in with the plotting: `ValueError: x and y must have same first dimension, but have shapes (1,) and (100001,)`. Can you fix this and provide a version of the code that does not require to wait for 18+6 minutes so to get the result (or the previous error) ? It make any debugging/check tricky. — Jérôme Richard, May 29 '22 at 17:34
I will try to provide a version that takes lesser time but the value error does not occur in my system. Maybe the problem occurs in some indentation while copy-pasting the code where an array is pasted outside the for loop. But I will still check it with the shorter versions if possible. Besides this, I expect it to give the same result while calling random() because the average behaviour (over time loop) remains almost same. This can be checked by re-running the same code again (which will produce different random values) but almost the same rate. — Lost, May 29 '22 at 17:49
Please add a [\[SO\]: How to create a Minimal, Reproducible Example (reprex (mcve))](https://stackoverflow.com/help/minimal-reproducible-example). What are in your opinion close rates? Is it the same code (because it's not small, and not well written (check https://peps.python.org/pep-0008))? — CristiFati, Jun 04 '22 at 09:30
@CristiFati I won't be able to reduce it fuether without getting rid of what I think important things. And yes it is the same code just written with arrays and with Numba. — Lost, Jun 04 '22 at 11:08

score 1 · Accepted Answer · answered Jun 04 '22 at 13:25

TL;DR: one problem comes from the naive line SP[j], SP[j + 1] = SP[j + 1], SP[j] behaving differently between programs due to the different type of SP. Another comes from the random number generation that behave differently in Numba (hard to track without taking care of reproducibility).

Debugging

Before delving into the problem, the first thing to do is to change the code so to get deterministic results so to be able to reproduce the error in a debugger and then track the difference with this. Then, you need to adapt the code so it can run quickly and be easy to debug since the program is generally executed several time before the bug can be located. Then, you can log the values of the variables impacting counter (which is the variable causing different results). Then, you can check the difference and locate a step where counter is different between the program. Finally, you can backtrack to the origin of the bug step by step. This is a generic method to debug such a program quite efficiently. Note that is could be even more efficient if you could use a reverse debugger (but it turns out that I did not found one that works on my machine for Python yet).

The key to get deterministic reproducible results is to use the following code:

np.random.seed(0)
seed(0)

The random Number generator of Numba is not synchronized with the one of Numpy by default and so the seed needs to be put in the Numba function so to actually impact the seed of the random number generator of Numba (rather than the one of Numpy from CPython). Moreover, random() of Numba is also not synchronized with the one of CPython, but using the same seed is not sufficient since Numba appears not to use the same algorithm internally anyway. Using np.random.rand() instead of random() fixes the reproducibility issue.

Under the hood

Here are the backtracking steps to understand what is the root of the problem:

counter differs at a specific step (when (i,j)=(2,20) for n=100 and iter=3)
collchk and prel are set to 0 in the second program while they are both different to 0 in the first
SP[j] and SP[j+1] are equal in the second program but not in the first
SP[j], SP[j + 1] = SP[j + 1], SP[j] causes the value to be equal only in the second program

This line behave differently in the two first program because the former deals with a 1D array of object wile the later deals with a 2D array of native values. Indeed, in the former, the object are properly swapped but the later operates on Numpy views and the implicit copy of the view does not copy the array content resulting in the overwrite. Here is how it works internally:

SP[j], SP[j + 1] = SP[j + 1], SP[j] is implicitly translated to something equivalent to that:

tmp1 = SP[j+1,:]
tmp2 = SP[j,:]
SP[j,:] = tmp1
SP[j+1,:] = tmp2

The thing is that tmp1 and tmp2 are references of two Numpy views: SP[j,:] and SP[j+1,:]. Thus, SP[j] = tmp1 causes the content of the view of tmp2 to be also overwritten. Thus, in the end, the row is replicated resulting to a sneaky bug.

One solution to fix this problem is to explicitly call np.copy on the swapped row. Swap two values in a numpy array. also provides an alternative interesting solution (note the second answer which is validated is currently the one causing bogus results in your case and comments pointed out that this was not a safe solution). Using SP[[j, j+1]] = SP[[j+1, j]] fixes the problem because SP[[j+1, j]] creates a temporary array (which is not a view of SP). Note that this solution is not supported by Numba. Numba does support the copy but creating temporary arrays is expensive. For the Numba code, you need to use the following code:

for k in range(SP.shape[1]):
    SP[j, k], SP[j + 1, k] = SP[j + 1, k], SP[j, k]

Finally, note that using np.random.rand with the above fix is sufficient to get the same results in Numba.

@Thanks a lot especially for (a sort of crash course on) debugging. I find that the array method takes longer than object method. You had mentioned earlier that object oriented generally takes longer than arrays but somehow in my case it seems to be the opposite. Do you have an idea why that could happen? Is it because there are vectorized substitutions to the object in the array program anything else (for instance for for loops)? — Lost, Jun 08 '22 at 16:49
What code did you measure? The Numba code or the CPython one? For Numba and more generally any native code, object-oriented codes tends to prevent the vectorization of the code (using SIMD instructions). That being said, if the code cannot be vectorized, then it is not much a problem (both will be equally fast). Using too many arrays can also cause effects like cache trashing due to the cache associativity. This is a complex topic. Note that the main performance issue with OOP is often due to polymorphism/virtuality which is not used here. — Jérôme Richard, Jun 08 '22 at 23:07
In your case, I do not expect Numba to vectorize the code due to the conditional branches. The swap+append make the vectorization even harder so there is no chance the compiler can vectorize the code. Thus, the OOP code may take a similar time because of that. The bound checking of Numba may cause additional checks in the array-based version that could make it slower (native codes generally does not have this). — Jérôme Richard, Jun 08 '22 at 23:13
In the end both version should be quite inefficient since they certainly operate on scalars while modern x86-64 CPUs can operate on 4 items at once (even 8 for Intel server processors). Conditionals cause CPU stalls making the code even more inefficient. That being said, your code is hard to vectorize mainly because of the SP swap. Indeed, without that you can use a method called loop-splitting so to compute `collchk`, `prel` and `rcoeff` in a vectorized way (possibly even the random call). Note that this operation cannot efficiently be done with the OOP version. — Jérôme Richard, Jun 08 '22 at 23:22
If you measure the CPython code, then most of the time is overheads and both versions should be very inefficient. Timings will be mainly dependent of the number of checks in CPython and Numpy but it is not very relevant to measure the difference since you generally do not want such a big overhead in practical applications. Also note that Numpy functions operating on arrays are often vectorized while the one operating on objects fall back to a slow CPython code. Scalar operations are very slow in both cases (see https://stackoverflow.com/questions/69584027). — Jérôme Richard, Jun 08 '22 at 23:28

Why deploying Numba and using arrays instead of class gives different results for the same parameters in my program?

1 Answers1

Debugging

Under the hood

Linked