I am using MingW64 to compile Cython for Python 3.7. The task is entirely an excerise where I am calculating the Julia set for a grid of points. I am following along with a book, "High Performance Python", which is walking though how to use Cython. I have previously used the -O flag to set optimizations for DSP hardware and gotten significant improvements by setting this to be higher i.e. -O3. This is not the case for the Cython code though.
When doing the same thing with the Cython code it steadily produces slower results as the optimization is increased, which seems to make no sense. The timings I get are:
-O1 = 0.41s
-O2 = 0.46s
-O3 = 0.47s
-Ofast = 0.48s
-Os = 0.419s
Does anyone have an idea for why this would seem to be working in the opposite optimizations?
The cython code is in the file cythonfn.pyx
def calculate_z(maxiter, zs, cs):
# add type primitives to improve execution time
cdef unsigned int i, n
cdef double complex z, c
output = [0] * len(zs)
for i in range(len(zs)):
n = 0
z = zs[i]
c = cs
# while n < maxiter and abs(z) < 2:
while n < maxiter and (z.real * z.real + z.imag * z.imag) < 4: # increases performance
z = z * z + c
n += 1
output[i] = n
return output
The setup is here as setup.py
from distutils.core import setup
from distutils.extension import Extension
from Cython.Distutils import build_ext
setup(
cmdclass={'build_ext': build_ext},
ext_modules=[Extension("calculate", ["cythonfn.pyx"])]
)
And the main code is
import calculate
import time
c_real,c_imag = -0.62772, -0.42193 # point for investigation
x1, x2, y1, y2 = -1.8, 1.8, -1.8, 1.8
def calc_julia(desired_width, max_iter):
x_step = (float(x2 - x1) / float(desired_width))
y_step = (float(y1 - y2) / float(desired_width))
x = []
y = []
ycoord = y2
while ycoord > y1:
y.append(ycoord)
ycoord += y_step
xcoord = x1
while xcoord < x2:
x.append(xcoord)
xcoord += x_step
zs = []
cs = complex(c_real, c_imag)
for ycoord in y:
for xcoord in x:
zs.append(complex(xcoord, ycoord))
st = time.time()
output = calculate.calculate_z(max_iter, zs, cs)
et = time.time()
print(f"Took {et-st} seconds")
if __name__ == '__main__':
calc_julia(desired_width=1000, max_iter=300)
And yes I am aware that there are a number of things that can be improved here, for example the cs could just be a scalar since they are all the same. This is more an exercise, but it led to surprising results for the optimization.