1

I am trying to get some Cython bindings to work where the external C code uses parameters of type char**, as usually seen for main methods.

Unfortunately, all of my previous attempts failed and I could not find any resources on how this actually can be achieved. The existing solutions I was able to find usually refer to arrays of numbers or require rewriting the original code.

How can a method using char** parameters be called, preferably without having to modify the call semantics of the underlying C code I am interfacing?


Example

# File setup.py
from setuptools import setup
from Cython.Build import cythonize

setup(
    ext_modules = cythonize("my_test.pyx", language_level=3)
)
# File my_test.pyx
from binding cimport add as _add, main as _main


def add(a, b):
    return _add(a, b)


def main(argc, argv):
    cdef char[:, ::1] argv_array = [b'fixed', b'values'] + [x.encode() for x in argv]
    return _main(argc + 2, &argv_array[0][0])
# File binding.pxd
cdef extern from "module1.c":
    int add(int a, int b)
    int main(int argc, char** argv)
// File module1.c
#include <stdio.h>

static int add(int a, int b) {
    return a + b;
}


int main(int argc, char** argv) {
    printf("Result: %d\n", add(40, 2));
    for (int i = 0; i < argc; i++) {
        printf("%s\n", argv[i]);
    }
    return 0;
}

Error message

(venv) user@host ~/path/to/directory $ python setup.py build_ext --inplace
Compiling my_test.pyx because it changed.
[1/1] Cythonizing my_test.pyx

Error compiling Cython file:
------------------------------------------------------------
...
    return _add(a, b)


def main(argc, argv):
    cdef char[:, ::1] argv_array = [x.encode() for x in argv]
    return _main(argc, &argv_array[0][0])
                      ^
------------------------------------------------------------

my_test.pyx:12:23: Cannot assign type 'char *' to 'char **'
Traceback (most recent call last):
  File "setup.py", line 5, in <module>
    ext_modules = cythonize("my_test.pyx", language_level=3)
  File "/home/user/path/to/directory/venv/lib/python3.8/site-packages/Cython/Build/Dependencies.py", line 1127, in cythonize
    cythonize_one(*args)
  File "/home/user/path/to/directory/venv/lib/python3.8/site-packages/Cython/Build/Dependencies.py", line 1250, in cythonize_one
    raise CompileError(None, pyx_file)
Cython.Compiler.Errors.CompileError: my_test.pyx

Declaring ctypedef char* cchar_tp and using it as cdef cchar_tp[:, ::1] argv_array will yield another error message:

Invalid base type for memoryview slice: cchar_tp

epR8GaYuh
  • 311
  • 9
  • 21
  • The answer I've linked to should also work for memoryviews as well as the `np.ndarray` syntax. Essentially a memoryview is a actually single array (hence just a pointer rather than pointer-to-pointer), thus you have to allocate an array of pointers for the start of each row of the memoryview – DavidW May 25 '22 at 17:25
  • 1
    @DavidW I'd say he specifics of `char**` arrays would deserve its own answer. The question you link deals just with `double**`. – ibarrond May 25 '22 at 17:35
  • I don't think there are any specifics to `char**`. You just change the types and it's the same – DavidW May 25 '22 at 18:49
  • With that said, if people want to vote to reopen it then fair enough - I just don't think it's significantly different personally – DavidW May 25 '22 at 19:03
  • @DavidW I tried to use the approach from the linked question with different variations, but either compilation or execution fails. A working answer would be highly appreciated for this reason. – epR8GaYuh May 26 '22 at 04:46
  • For reference, the linked question was https://stackoverflow.com/questions/40754724/passing-numpy-arrays-in-cython-to-a-c-function-that-requires-dynamically-allocat – DavidW May 26 '22 at 05:55

1 Answers1

2

The problem you're facing is that a 2D memoryview/array is not a pointer to pointers (because that's generally an awful way of storing an array). Instead it's a single 1D array and some sizes defining the length of the dimensions. Note that char** (representing a "list" of strings) isn't quite the same as a 2D array since generally the strings are of different lengths.

Therefore you must create a separate array of pointers, each of which can point into your larger array. This is discussed in this question, which I originally marked as a duplicate, and still think is probably a duplicate. The approach there should still work.

You can take one shortcut with Python bytes objects - they can be assigned directly to a const char*. The pointer will just point into the Python-owned memory so the bytes object must outlive the C pointer. In this case I ensure it by stashing them safely in a list.

from libc.stdlib cimport malloc, free

cdef extern from *:
    """
    int m(int n, const char**) {
        return 1;
    }
    """
    int m(int n, const char**)

def call_m():
    cdef const char** to_pass
    args = [b"arg1", b"arg2"]
    to_pass = <const char**>malloc(sizeof(const char*)*len(args))
    try:
        for n, a in enumerate(args):
            to_pass[n] = a  # use auto-conversion from Python bytes to char*
        m(len(args), to_pass)
    finally:
        free(to_pass)
DavidW
  • 29,336
  • 6
  • 55
  • 86