I was writing a code example showing a problem with a race condition with numba.jit
and parallel=True
.
import numpy as np
import pandas as pd
from numba import jit, prange
from collections import Counter
import fractions
n = 10**6
m = 10**6
@jit(nopython=True, parallel=True)
def test():
lst = [0]
for i in prange(n):
lst[0] += 1
return lst
error = Counter([str(fractions.Fraction(test()[0], n)) for _ in range(m)])
df = pd.DataFrame(error.items())
def func(x,y='1'): return int(x)/int(y)
df[2] = df[0].apply(lambda _str: func(*_str.split('/')))
df = df.sort_values(2)
ax = df.plot.bar(x=0, y=1)
ax.set_xlabel('ratio count/maximal_count')
ax.get_legend().remove()
What surprised me was that the miscounts due to race condition are multiples of n/cpu cores. It is distributed like this.
I basically understand what's going on:
lst[0] += 1
is short for
buffer = lst[0]
lst[0] = buffer+1
And if an other process is doing the same thing they might overwrite in the wrong moment.
I have two questions though:
- Can somebody confirm that it's ruffly distributed like this?
- And why is it distributed like this?