1

I'd like to call my cdef methods and improve the speed of my program as much as possible. I do not want to use cpdef (I explain why below). Ultimately, I'd like to access cdef methods (some of which return void) that are members of my Cython extensions.

I tried following this example, which gives me the impression that I can call a cdef function by making a Python (def) wrapper for it.

I can't reproduce these results, so I tried a different problem for myself (summing all the numbers from 0 to n).

Of course, I'm looking at the documentation, which says

The directive cpdef makes two versions of the method available; one fast for use from Cython and one slower for use from Python.

and later (emphasis mine),

This does slightly more than providing a python wrapper for a cdef method: unlike a cdef method, a cpdef method is fully overridable by methods and instance attributes in Python subclasses. It adds a little calling overhead compared to a cdef method.

So how does one use a cdef function without the extra calling overhead of a cpdef function?

With the code at the end of this question, I get the following results:

def/cdef:
273.04207632583245
def/cpdef:
304.4114626176919
cpdef/cdef:
0.8969507060538783

Somehow, cpdef is faster than cdef. For n < 100, I can occasionally get cpdef/cdef > 1, but it's rare. I think it has to do with wrapping the cdef function in a def function. This is what the example I link to does, but they claim better performance from using cdef than from using cpdef.

I'm pretty sure this is not how you wrap a cdef function while avoiding the additional overhead (the source of which is not clearly documented) of a cpdef.

And now, the code:

setup.py

from setuptools import setup, Extension
from Cython.Build import cythonize

pkg_name = "tmp"

compile_args=['-std=c++17']

cy_foo = Extension(
        name=pkg_name + '.core.cy_foo',
        sources=[
            pkg_name + '/core/cy_foo.pyx',
        ],
        language='c++',
        extra_compile_args=compile_args,
)

setup(
    name=pkg_name,
    ext_modules=cythonize(cy_foo,
                          annotate=True,
                          build_dir='build'),
    packages=[
        pkg_name,
        pkg_name + '.core',
    ],
)

foo.py

def foo_def(n):
    sum = 0
    for i in range(n):
        sum += i
    return sum

cy_foo.pyx

def foo_cdef(n):
    return foo_cy(n)

cdef int foo_cy(int n):
    cdef int sum = 0
    cdef int i = 0
    for i in range(n):
        sum += i
    return sum

cpdef int foo_cpdef(int n):
    cdef int sum = 0
    cdef int i = 0
    for i in range(n):
        sum += i
    return sum

test.py

import timeit
from tmp.core.foo import foo_def
from tmp.core.cy_foo import foo_cdef
from tmp.core.cy_foo import foo_cpdef

n = 10000

# Python call
start_time = timeit.default_timer()
a = foo_def(n)
pyTime = timeit.default_timer() - start_time

# Call Python wrapper for C function
start_time = timeit.default_timer()
b = foo_cdef(n)
cTime = timeit.default_timer() - start_time

# Call cpdef function, which does more than wrap a cdef function (whatever that means)
start_time = timeit.default_timer()
c = foo_cpdef(n)
cpTime = timeit.default_timer() - start_time

print("def/cdef:")
print(pyTime/cTime)
print("def/cpdef:")
print(pyTime/cpTime)
print("cpdef/cdef:")
print(cpTime/cTime)
Jonathon Reinhart
  • 132,704
  • 33
  • 254
  • 328
  • 1
    For cdef+def vs. cpdef see https://stackoverflow.com/a/48879057/5769463: the small overhead documentation mentions is only paid in cdef-classes. – ead Jan 14 '19 at 05:05
  • Better way to get meaningful timings is to use timeit.timeit(...) https://docs.python.org/3/library/timeit.html#timeit.timeit – ead Jan 14 '19 at 05:07
  • Also when choosing the test function, you must keep in mind that C-compiler can do some quite clever optimization. It is known, that clang transforms the loop from your function to simple multiplication (https://godbolt.org/z/jvlGQg) but I guess you don't use clang... – ead Jan 14 '19 at 05:16
  • So cpdef is actually less costly than cdef? – masterBuilderBenny Jan 14 '19 at 06:28
  • 1
    I think you've missed the point on a number of things. 1) Methods vs functions - some of the overhead you've described only applies to "methods" (functions attached to a class) and not to plain functions, which is what you're testing with. 2) `cdef` functions are quick to call from Cython but impossible to call from Python. The statement "It adds a little calling overhead compared to a cdef method" is comparing calling a `cdef` method from Cython with calling a `cpdef` method from Cython. Your measurements involve calling them (or a `def` wrapper) from Python. – DavidW Jan 14 '19 at 09:18
  • Can you point me to a primary source that explains this? – masterBuilderBenny Jan 14 '19 at 15:11
  • [This section of the documentation](https://cython.readthedocs.io/en/latest/src/userguide/language_basics.html#python-functions-vs-c-functions) is pretty good at explaining it I think. – DavidW Jan 14 '19 at 15:37
  • That section doesn’t make the distinction you’re making in your first point. – masterBuilderBenny Jan 14 '19 at 16:20
  • The first point is at least partly suggested in the documentation you quote - it comes from a section of documentation on extension types and says "a cpdef method is fully overridable by methods and instance attributes in Python subclasses", which can only really apply to methods and not functions. – DavidW Jan 14 '19 at 16:38

2 Answers2

0

The reason for your seemingly anomalous result is that you aren't calling the cdef function foo_cy directly, but instead the def function foo_cdef wrapping it.

0

when you are wrapping inside another def indeed you are again calling the python function. However you should be able to reach similar results as the cpdef. Here is what you could do: like the python def, give the type for both input and output

def foo_cdef(int n):
    cdef int val = 0
    val = foo_cy(n)
    return val

this should have similar results as cpdef, however again you are calling a python function. If you want to directly call the c function, you should use the ctypes and call from there.

and for the benchmarking, the way that you have written, it only considers one run and could fluctuate a lot due OS other task and as well the timer.

better to use the timeit builtin method to calculate for some iteration:

# Python call
pyTime = timeit.timeit('foo_def(n)',globals=globals(), number=10000)

# Call Python wrapper for C function
cTime = timeit.timeit('foo_cdef(n)',globals=globals(), number=10000)

# Call cpdef function, which does more than wrap a cdef function (whatever that means)
cpTime = timeit.timeit('foo_cpdef(n)',globals=globals(), number=10000)

output:

def/cdef:
154.0166154428522
def/cpdef:
154.22669848136132
cpdef/cdef:
0.9986378296327566

like this, you get consistent results and as well you see always close to 1 for both either cython itself wraps or we explicitly wrap around a python function.

amirhm
  • 1,239
  • 9
  • 12