Below I've reformatted the previous example code where I used C++ constructs, to only use C and pybind11 ones.
#include <pybind11/pybind11.h>
#include <stdio.h>
#if PY_VERSION_HEX < 0x03000000
#define MyPyText_AsString PyString_AsString
#else
#define MyPyText_AsString PyUnicode_AsUTF8
#endif
namespace py = pybind11;
int run(py::object pyargv11) {
int argc = 0;
char** argv = NULL;
PyObject* pyargv = pyargv11.ptr();
if (PySequence_Check(pyargv)) {
Py_ssize_t sz = PySequence_Size(pyargv);
argc = (int)sz;
argv = (char**)malloc(sz * sizeof(char*));
for (Py_ssize_t i = 0; i < sz; ++i) {
PyObject* item = PySequence_GetItem(pyargv, i);
argv[i] = (char*)MyPyText_AsString(item);
Py_DECREF(item);
if (!argv[i] || PyErr_Occurred()) {
free(argv);
argv = nullptr;
break;
}
}
}
if (!argv) {
//fprintf(stderr, "argument is not a sequence of strings\n");
//return;
if (!PyErr_Occurred())
PyErr_SetString(PyExc_TypeError, "could not convert input to argv");
throw py::error_already_set();
}
for (int i = 0; i < argc; ++i)
fprintf(stderr, "%s\n", argv[i]);
free(argv);
return 0;
}
PYBIND11_MODULE(example, m) {
m.def("run", &run, "runs the example");
}
Below I will heavily comment it out to explain what I'm doing and why.
In Python2, string objects are char*
based, in Python3, they are Unicode based. Hence the following macro MyPyText_AsString
that changes behavior based on Python version, since we need to get to C-style "char*".
#if PY_VERSION_HEX < 0x03000000
#define MyPyText_AsString PyString_AsString
#else
#define MyPyText_AsString PyUnicode_AsUTF8
#endif
The pyargv11 py::object
is a thin handle on a Python C-API handle object; since the following code makes use of the Python C-API, it's easier to deal with the underlying PyObject*
directly.
void closed_func_wrap(py::object pyargv11) {
int argc = 0; // the length that we'll pass
char** argv = NULL; // array of pointers to the strings
// convert input list to C/C++ argc/argv :
PyObject* pyargv = pyargv11.ptr();
The code will only accept containers that implement the sequence protocol and can thus be looped over. This covers the two most important ones PyTuple
and PyList
at the same time (albeit a tad slower than checking for those types directly, but this will keep the code more compact). To be fully generic, this code should also check for the iterator protocol (e.g. for generators and probably reject str objects, but both are unlikely.
if (PySequence_Check(pyargv)) {
Okay, we have a sequence; now get its size. (This step is the reason why for ranges you'd need to use the Python iterator protocol since their size is typically not known (although you can request a hint).)
Py_ssize_t sz = PySequence_Size(pyargv);
One part, the size is done, store it in the variable that can be passed on to other functions.
argc = (int)sz;
Now allocate the array of pointers to char*
(technically const char*
,but that matters not here as we'll cast it away).
argv = (char**)malloc(sz * sizeof(char*));
Next, loop over the sequence to retrieve the individual elements.
for (Py_ssize_t i = 0; i < sz; ++i) {
This gets a single elemenent from the sequence. The GetItem call is equivalent to Pythons "[i]", or getitem call.
PyObject* item = PySequence_GetItem(pyargv, i);
In Python2, string objects are char* based, in Python3, they are unicode based. Hence the following macro "MyPyText_AsString" that changes behavior based on Python version, since we need to get to C-style "char*".
The cast from const char*
to char*
here is in principle safe, but the contents of argv[i]
must NOT be modified by other functions. The same is true for the argv
argument of a main()
, so I'm assuming that to be the case.
Note that the C string is NOT copied. The reason is that in Py2, you simply get access to the underlying data and in Py3, the converted string is kept as a data member of the Unicode object and Python will do the memory management. In both cases, we are guaranteed that their lifetimes will be at least as long as the lifetime as the input Python object (pyargv11), so at least for the duration of this function call. If other functions decide to keep pointers, copies would be needed.
argv[i] = (char*)MyPyText_AsString(item);
The result of PySequence_GetItem
was a new reference, so now that we're done with it, drop it:
Py_DECREF(item);
It is possible that the input array did not contain only Python str objects. In that case, the conversion will fail and we need to check for that case, or "closed_function" may segfault.
if (!argv[i] || PyErr_Occurred()) {
Clean up the memory previously allocated.
free(argv);
Set argv to NULL
for success checking later on:
argv = nullptr;
Give up on the loop:
break;
If the given object was not a sequence, or if one of the elements of the sequence was not a string, then we don't have an argv
and so we bail:
if (!argv) {
The following is a bit lazy, but probably better to understand if all you want to look at is C code.
fprintf(stderr, "argument is not a sequence of strings\n");
return;
What you should really do, is check whether an error was already set (e.g. b/c of a conversion problem) and set one if not. Then notify pybind11 of it. This will give you a clean Python exception on the caller's end. This goes like so:
if (!PyErr_Occurred())
PyErr_SetString(PyExc_TypeError, "could not convert input to argv");
throw py::error_already_set(); // by pybind11 convention.
Alright, if we get here, then we have an argc
and argv
, so now we can use them:
for (int i = 0; i < argc; ++i)
fprintf(stderr, "%s\n", argv[i]);
Finally, clean up the allocated memory.
free(argv);
Notes:
- I would still advocate for the use of at least
std::unique_ptr
as that makes life so much easier in case there are C++ exceptions thrown (from custom converters of any input object).
- I was originally expecting to be able to replace all of the code with the one-liner
std::vector<char*> pv{pyargv.cast<std::vector<char*>>()};
after #include <pybind11/stl.h>
, but I found that that does not work (even as it does compile). Neither did using std::vector<std::string>
(also compiles, but also fails at run-time).
Just ask if anything is still unclear.
EDIT: If you truly only want to have a PyListObject, just call PyList_Check(pyargv11.ptr())
and if true, cast the result: PyListObject* pylist = (PyListObject*)pyargv11.ptr()
. Now, if you want to work with py::list
, you can also use the following code:
#include <pybind11/pybind11.h>
#include <stdio.h>
#if PY_VERSION_HEX < 0x03000000
#define MyPyText_AsString PyString_AsString
#else
#define MyPyText_AsString PyUnicode_AsUTF8
#endif
namespace py = pybind11;
int run(py::list inlist) {
int argc = (int)inlist.size();
char** argv = (char**)malloc(argc * sizeof(char*));
for (int i = 0; i < argc; ++i)
argv[i] = (char*)MyPyText_AsString(inlist[i].ptr());
for (int i = 0; i < argc; ++i)
fprintf(stderr, "%s\n", argv[i]);
free(argv);
return 0;
}
PYBIND11_MODULE(example, m) {
m.def("run", &run, "runs the example");
}
This code is shorter only b/c it has less functionality: it only accepts lists and it also is more clunky in error handling (eg. it will leak if passed in a list of integers due to pybind11 throwing an exception; to fix that, use unique_ptr as in the very first example code so that argv is freed on exception).