I have a simple Monte-Carlo Pi computation program. I tried running it on 2 different boxes(same hardware with slightly different kernel versions). I am seeing significant performance drop in one case(twice the time). Without threads, performance is mostly same. Profiling execution of the programs indicated that the one which is slower spends lesser time per futex call.
- Is this related to any kernel parameters?
- Can CPU flags affect futex performance ? /proc/cpuinfo indicates that cpu flags are slightly different.
- Is this someway related to python version?
Linux(3.10.0-123.20.1 (Red Hat 4.4.7-16)) Python 2.6.6
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
99.69 53.229549 5 10792796 5385605 futex
Profile Output
==============
256 function calls in 26.189 CPU seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
39 26.186 0.671 26.186 0.671 :0(acquire)
Linux(3.10.0-514.26.2 (Red Hat 4.8.5-11)) Python 2.7.5
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
99.69 94.281979 8 11620358 5646413 futex
Profile Output
==============
259 function calls in 53.448 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
38 53.445 1.406 53.445 1.406 :0(acquire)
Test Program
import random
import math
import time
import threading
import sys
import profile
def find_pi(tid, n):
t0 = time.time()
in_circle = 0
for i in range(n):
x = random.random()
y = random.random()
dist = math.sqrt(pow(x, 2) + pow(y, 2))
if dist < 1:
in_circle += 1
pi = 4.0 * (float(in_circle)/float(n))
print 'Pi=%s - thread(%s) time=%.3f sec' % (pi, tid, time.time() - t0)
return pi
def main():
if len(sys.argv) > 1:
n = int(sys.argv[1])
else:
n = 6000000
t0 = time.time()
threads = []
num_threads = 5
print 'n =', n
for tid in range(num_threads):
t = threading.Thread(target=find_pi, args=(tid,n,))
threads.append(t)
t.start()
for t in threads:
t.join()
#main()
profile.run('main()')
#profile.run('find_pi(1, 6000000)')