I am running a simulation which has two methods which are called every iteration of the simulation and taking up most of computation time.
The methods are:
def calculate_macrovars(f, n, ns, ex, ey, dens, velx, vely):
fd = f[:,2:n+2,2:n+2]
dens[:,:] = fd[0,:,:]
velx[:,:] = 0.
vely[:,:] = 0.
for s in range(1,ns):
fds = fd[s,:,:]
dens += fds
velx += ex[s]*fds
vely += ey[s]*fds
dens[:,:] = dens
velx[:,:] /= dens
vely[:,:] /= dens
and
def do_relaxation(f, ftmp, n, ns, ws, ex, ey, cssq, omega, dens, velx, vely):
dist = f[:,2:n+2,2:n+2]
dist_tmp = ftmp[:,2:n+2,2:n+2]
vv = (velx*velx + vely*vely)/cssq
for s in range(ns):
ev = (ex[s]*velx + ey[s]*vely)/cssq
dist_eq = ws[s]*dens*(1 + ev + 0.5*ev*ev - 0.5*vv)
dist_tmp[s,:,:] = (1. - omega)*dist[s,:,:] + omega*dist_eq
The arguments to these functions are 3d arrays f
and ftmp
(shape=nsxnxn
dtype=np.float32
), scalars n
and ns
(int
), 1d array ws
(shape=nsx1
dtype=np.float32
), 1d arrays ex
and ey
(shape=nsx1
dtype=np.int8
), scalar cssq
and omega
(float
), 2d arrays dens
, velx
, vely
(shape=nxn
dtype=np.float32
)
I profiled a simulation run and found each method respectively takes up 25% and 69% of the time for a total of 94% of the computation time as shown below (I have trimmed the profiles to remove all lines which take negligible time to run):
Line # Hits Time Per Hit % Time Line Contents
==============================================================
2 def tgv_simulation():
84 4981 16577491 3328.1 24.9 calculate_macrovars()
89 4981 45899937 9215.0 68.9 do_relaxation()
91 4981 3614229 725.6 5.4 do_streaming()
Total time: 66.6111 s
Profiling calculate_macrovars()
yields:
Line # Hits Time Per Hit % Time Line Contents
==============================================================
2 def calculate_macrovars():
5 4981 16353 3.3 0.1 fd = f[:,2:n+2,2:n+2]
6 4981 376653 75.6 2.2 dens[:,:] = fd[0,:,:]
7 4981 256929 51.6 1.5 velx[:,:] = 0.
8 4981 246298 49.4 1.4 vely[:,:] = 0.
9 44829 41229 0.9 0.2 for s in range(1,ns):
10 39848 47308 1.2 0.3 fds = fd[s,:,:]
11 39848 3739018 93.8 21.6 dens += fds
12 39848 5699723 143.0 32.9 velx += ex[s]*fds
13 39848 5428619 136.2 31.4 vely += ey[s]*fds
15 4981 723137 145.2 4.2 velx[:,:] /= dens
16 4981 719715 144.5 4.2 vely[:,:] /= dens
Total time: 17.3031 s
Profiling do_relaxation()
yields:
Line # Hits Time Per Hit % Time Line Contents
==============================================================
19 def do_relaxation():
22 4981 17183 3.4 0.0 dist = f[:,2:n+2,2:n+2]
23 4981 6700 1.3 0.0 dist_tmp = ftmp[:,2:n+2,2:n+2]
24 4981 1510181 303.2 3.3 vv = (velx*velx + vely*vely)/cssq
25 49810 61256 1.2 0.1 for s in range(ns):
26 44829 12988964 289.7 28.7 ev = (ex[s]*velx + ey[s]*vely)/cssq
27 44829 18644169 415.9 41.2 dist_eq = ws[s]*dens*(1 + ev + 0.5*ev*ev - 0.5*vv)
28 44829 11993650 267.5 26.5 dist_tmp[s,:,:] = (1. - omega)*dist[s,:,:] + omega*dist_eq
Total time: 45.2221 s
What sort of optimization strategies should i employ to improve performance on these two methods?
Update: Vectorizing calculate_macrovars
:
def calculate_macrovars(f, dens, velx, vely):
fd = f[:,2:n+2,2:n+2]
dens[:,:] = numpy.sum(fd, axis=0)
velx[:,:] = numpy.sum(fd*ex[:, None, None], axis=0)/dens
vely[:,:] = numpy.sum(fd*ey[:, None, None], axis=0)/dens
profiled:
Line # Hits Time Per Hit % Time Line Contents
==============================================================
2 def calculate_macrovars(f, n, ns, ex, ey, dens, velx, vely):
5 4981 15644 3.1 0.1 fd = f[:,2:n+2,2:n+2]
16 4981 4232570 849.7 23.9 dens[:,:] = numpy.sum(fd, axis=0)
17 4981 6887395 1382.7 38.9 velx[:,:] = numpy.sum(fd*ex[:, None, None], axis=0)/dens
18 4981 6557733 1316.5 37.1 vely[:,:] = numpy.sum(fd*ey[:, None, None], axis=0)/dens
Total time: 17.6933 s
Approximately the same performance as the loop-ed version of calculate_macrovars
.