0

I would not like to keep converting every Python String Object from PyObject* to std::string or char* with PyUnicode_DecodeUTF8 and PyUnicode_AsUTF8 because it is an expensive operation.

On my last question How to extend/reuse Python C Extensions/API implementation?, I managed to use the Python open function, to directly give me a PyObject* string. Once doing that, it is very simple to send the string back to the Python program because I can just pass its PyObject* pointer back, instead of doing a full char-by-char copy as PyUnicode_DecodeUTF8 or PyUnicode_AsUTF8 do.

On the regex implementation of CPython API, I can find a function like this:

static void* getstring(PyObject* string, Py_ssize_t* p_length,
          int* p_isbytes, int* p_charsize,
          Py_buffer *view)
{
    /* given a python object, return a data pointer, a length (in
       characters), and a character size.  return NULL if the object
       is not a string (or not compatible) */

    /* Unicode objects do not support the buffer API. So, get the data directly. */
    if (PyUnicode_Check(string)) {
        if (PyUnicode_READY(string) == -1)
            return NULL;
        *p_length = PyUnicode_GET_LENGTH(string);
        *p_charsize = PyUnicode_KIND(string);
        *p_isbytes = 0;
        return PyUnicode_DATA(string);
    }

    /* get pointer to byte string buffer */
    if (PyObject_GetBuffer(string, view, PyBUF_SIMPLE) != 0) {
        PyErr_SetString(PyExc_TypeError, "expected string or bytes-like object");
        return NULL;
    }
    *p_length = view->len;
    *p_charsize = 1;
    *p_isbytes = 1;

    if (view->buf == NULL) {
        PyErr_SetString(PyExc_ValueError, "Buffer is NULL");
        PyBuffer_Release(view);
        view->buf = NULL;
        return NULL;
    }
    return view->buf;
}

It does not seem to be using PyUnicode_DecodeUTF8 or PyUnicode_AsUTF8 to work with the PyObject* coming from the Python Interpreter.

How can I use basic string operations with PyObject* strings without conversion then to std::string or char*?

I would call basic operations the following examples: (Just for exemplifying, I am using Py_BuildValue to build a PyObject* string from a string as a char* or std::string)

static PyObject* PyFastFile_do_concatenation(PyFastFile* self)
{
    PyObject* hello = Py_BuildValue( "s", "Hello" );
    PyObject* word = Py_BuildValue( "s", "word" );

    // I am just guessing the `->value` property
    PyObject* hello_world = hello->value + word->value;
    hello_world; // return the `PyObject*` string `Hello word`
}

static PyObject* PyFastFile_do_substring(PyFastFile* self)
{
    PyObject* hello = Py_BuildValue( "s", "Hello word" );
    PyObject* hello_world = hello->value[5:];
    hello_world; // return the `PyObject*` string `word`
}

static PyObject* PyFastFile_do_contains(PyFastFile* self)
{
    PyObject* hello = Py_BuildValue( "s", "Hello word" );

    if( "word" in hello->value ) {
        Py_BuildValue( "p", true ); // return the `PyObject*` boolean `true`
    }
    Py_BuildValue( "p", false ); // return the `PyObject*` boolean `false`
}
Evandro Coan
  • 8,560
  • 11
  • 83
  • 144
  • 1
    Um… are you asking for a link to the `PyUnicode_*` API documentation? – Davis Herring May 27 '19 at 03:31
  • The link is [this](https://docs.python.org/3/c-api/unicode.html) but not many examples over there. – Evandro Coan May 27 '19 at 03:34
  • Why does it need examples? The relevant functions have pretty obvious names and behave in predictable ways. For example the concat function takes two `PyObject*` inputs and returns another `PyObject*` as output - what could be unclear about that? – DavidW May 28 '19 at 07:27

0 Answers0