I'm trying to use Cython to speed up some parts of my Python script. One key section applies functions to a Pandas dataframe; since this is done many times, I wanted to write these functions with Cython for faster calculations. Functions are below, and are in the same Jupyter notebook cell:
%%cython
cimport numpy as np
import numpy as np
cdef double breadth_c_type(np.ndarray[np.float64_t, ndim=1] arr):
""" Calculates range between the maximum and minimum values of a given list. """
return (max(arr) - min(arr))
cdef double evenness_c_type(np.ndarray[np.float64_t, ndim=1] arr):
""" Calculates the sample variance of differences between values in a sorted list. """
cdef np.ndarray[double] sorted_arr
cdef list desc_diff
cdef double m
cdef double var_res
sorted_arr = sorted(arr)
desc_diff = []
for x in range(len(arr)-1):
desc_diff.append(sorted_arr[x+1]-sorted_arr[x])
# following used to avoid usage of numpy
m = sum(desc_diff) / len(desc_diff)
var_res = sum((xi - m)**2 for xi in desc_diff) / len(desc_diff)
return var_res
The notebook cell runs successfully as written, so I thought that both functions compiled successfully. However, this code runs as expected:
%timeit rand_df.apply(breadth_c_type, raw=True)
whereas this code:
%timeit rand_df.apply(evenness_c_type, raw=True)
doesn't run, and returns "NameError: name 'evenness_c_type' is not defined". I get the same results without the %timeit decorator, and the functions don't compile when using 'cpdef' or 'def' in place of 'cdef'. Since I tried to follow the same syntax for both functions, I don't know what's causing the error for evenness_c_type.
EDIT
Thanks to @DavidW, I figured out the problems with the evenness_c_type()
function. It compiles and runs well, although not as fast as the plain Cython version.
cdef double evenness_c_type(np.ndarray[np.float64_t, ndim=1] arr):
""" Calculates the population variance of differences between values in a sorted list. """
cdef np.ndarray [double] desc_diff=np.empty(len(arr)-1, dtype = np.float64)
arr.sort()
for x in range(len(arr)-1):
desc_diff[x]=(arr[x+1]-arr[x])
return np.var(desc_diff)