Quick win
First you should check if your function is already implemented in Numpy. If so, it will probably be a very fast (C/C++) implementation.
This is the case for your function:
import numpy as np
x = np.array([1, 4, 9])
y = np.sqrt(x)
#> array([ 1, 2, 3])
With Numpy arrays
Alternatively (following @joni's comment), you can use np arrays just for the input/output and compute the function element-wise using C/C++:
cimport numpy as cnp
import numpy as np
from libc.math cimport sqrt
cpdef cnp.ndarray[double, ndim=1] cy_sqrt_np(cnp.ndarray[double, ndim=1] x):
cdef Py_ssize_t i, l=x.shape[0]
cdef np.ndarray[double, 1] y = np.empty(l)
for i in range(l):
y[i] = sqrt(x[i])
return y
With C++ vectors
Lastly, here is a possible implementation with C++ vectors and automatic conversion from/to python lists:
from libc.math cimport sqrt
from libcpp.vector cimport vector
cpdef vector[double] cy_sqrt_vec(vector[double] x):
cdef Py_ssize_t i, l = x.size()
cdef vector[double] y
y.reserve(l)
for i in range(l):
y.push_back(sqrt(x[i]))
return y
Some things to keep in mind in this case and the previous:
- We initialize the
y
vector to be empty, and then allocate space for it with reserve()
. According to SO this seems to be a good option.
- We use a typed
i
in the for loop, and use push_back
to assign new values.
- We use
sqrt
from libc.math
to avoid using Python code inside the loop.
- We type the input of the function to be
vector[double]
. This automatically adds convenient type conversions from other python types (e.g., list of ints).
Time comparison
We define a random input x
to avoid cached results polluting our measures:
%%timeit -n 10000 -r 7 x = gen_x()
y = np.sqrt(x)
#> executed in 177ms, finished 16:16:57 2022-04-19
#> 2.3 µs ± 241 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%%timeit -n 10000 -r 7 x = gen_x()
y = x**.5
#> executed in 194ms, finished 16:16:51 2022-04-19
#> 2.46 µs ± 256 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%%timeit -n 10000 -r 7 x = gen_x()
y = cy_sqrt(x)
executed in 359ms, finished 16:17:02 2022-04-19
4.9 µs ± 274 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%%timeit -n 10000 -r 7 x = list(gen_x())
y = cy_sqrt_vec(x)
executed in 2.85s, finished 16:17:11 2022-04-19
40.4 µs ± 1.31 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
As expected, the np.sqrt
version wins. Besides, the vector allocation looks comparatively slower.