1

I am trying to improve the performance of calculating the Jonswap spectrum using cython. But it seems like much slower than the original code. How can I improve this?

cython code:

from libc.math cimport exp
from libc.stdlib cimport malloc
import numpy as np
cimport numpy as np

DTYPE_float = np.float64
ctypedef np.float64_t DTYPE_float_t

def jonswap(np.ndarray[DTYPE_float_t, ndim=1, mode ='c'] w, double Hs, double Tp, double gamma = 3.7):
    '''
    get Jonswap spectra
    :param w: np.ndarray Angular Frequency
    '''
    cdef:
        int n = w.shape[0]
        double *sigma = <double*>malloc(n * sizeof(double)) 
        double *a = <double*>malloc(n * sizeof(double)) 
        int i 
    cdef double wp
    cdef np.ndarray[DTYPE_float_t, ndim=1, mode='c'] sj = np.ones(n, dtype=DTYPE_float)

    wp = 2 * np.pi / Tp
    for i in range(n):
        sigma[i] = 0.07 if w[i] < wp else 0.09
        a[i] = exp(-0.5 * pow((w[i] - wp) / (sigma[i] * w[i]), 2.0))
        sj[i] = 320 * pow(Hs, 2) * pow(w[i], -5.0) / pow(Tp, 4) * exp(-1950 * pow(w[i], -4) / pow(Tp, 4)) * pow(gamma, a[i])

    return sj

original code:

def jonswap(w: np.ndarray, Hs: float, Tp: float, gamma: float = 3.7) -> np.ndarray:
    '''
    get Jonswap spectra
    :param w: np.ndarray Angular Frequency
    '''
    omega = w
    wp = 2 * np.pi / Tp
    sigma = np.where(omega < wp, 0.07, 0.09)
    a = np.exp(-0.5 * np.power((omega - wp) / (sigma * omega), 2.0))
    sj = 320 * np.power(Hs, 2) * np.power(omega, -5.0) / np.power(Tp, 4) * \
          np.exp(-1950 * np.power(omega, -4) / np.power(Tp, 4)) * np.power(gamma, a)

    return sj
kneehow
  • 21
  • 2

1 Answers1

2

Your original code is all vectorized numpy ops, so the room for improvement is limited. Running cython with the annotate flag (-a) points out the following possible improvements.

  • use the libc pow instead of the python builtin
  • elide boundschecking/wraparound semantics
  • use c division to turn off divide by 0 checking (if this is safe to do!)

New cython version

from libc.math cimport exp, pow
from libc.stdlib cimport malloc
import numpy as np
cimport numpy as np
cimport cython

DTYPE_float = np.float64
ctypedef np.float64_t DTYPE_float_t

@cython.boundscheck(False)
@cython.wraparound(False)
@cython.cdivision(True)
def cy_jonswap(np.ndarray[DTYPE_float_t, ndim=1, mode ='c'] w, double Hs, double Tp, double gamma = 3.7):
    '''
    get Jonswap spectra
    :param w: np.ndarray Angular Frequency
    '''
    cdef:
        int n = w.shape[0]
        double *sigma = <double*>malloc(n * sizeof(double)) 
        double *a = <double*>malloc(n * sizeof(double)) 
        int i 
    cdef double wp
    cdef np.ndarray[DTYPE_float_t, ndim=1, mode='c'] sj = np.ones(n, dtype=DTYPE_float)

    wp = 2 * np.pi / Tp
    with nogil:
        for i in range(n):
            sigma[i] = 0.07 if w[i] < wp else 0.09
            a[i] = exp(-0.5 * pow((w[i] - wp) / (sigma[i] * w[i]), 2.0))
            sj[i] = 320 * pow(Hs, 2) * pow(w[i], -5.0) / pow(Tp, 4) * exp(-1950 * pow(w[i], -4) / pow(Tp, 4)) * pow(gamma, a[i])

    return sj

Timings

w = np.random.randn(1_000_000)

%timeit cy_jonswap(w, .5, .5)
289 ms ± 7.34 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit jonswap(w, .5, .5)
411 ms ± 26.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Also, note that in your cython version you are leaking memory for sigma and a

chrisb
  • 49,833
  • 8
  • 70
  • 70
  • 1
    Most of the slow pow calls are not necessary. eg Hs*Hs is much faster than pow(Hs,2). https://stackoverflow.com/a/53172561/4045774 It should also make quite some difference if you have pow(w[i], -5) instead of pow(w[i], -5.0) or even 1./(w[i]*w[i]*w[i]*w[i]*w[i]) – max9111 Nov 08 '18 at 16:19