0

I am struggling with converting from Python str to C++ and back. For Python 2/3 compatibility, I thought using str/bytes for Py2/3, respectively, would suffice (the defines).

Note this is extracted from a larger codebase; apologies for any missing imports.

// C++ stuff compiled to convertor.so
#include "Python.h"
#if PY_MAJOR_VERSION >= 3
    #define PyString_Size PyBytes_Size
    #define PyString_AsString PyBytes_AsString
    #define PyString_FromStringAndSize PyBytes_FromStringAndSize
#endif

template<typename T>
struct vec {
  T *ptr;
  i64 size;
};

extern "C"
vec<uint8_t> str_to_char_arr(PyObject* in) {
  int64_t dimension = (int64_t) PyString_Size(in);
  vec<uint8_t> t;
  t.size = dimension;
  t.ptr = (uint8_t*) PyString_AsString(in);
  return t;
}

extern "C"
PyObject* char_arr_to_str(vec<uint8_t> inp) {
  Py_Initialize();
  PyObject* buffer = PyString_FromStringAndSize((const char*) inp.ptr, inp.size);
  return buffer;
}


# Python stuff
class Vec(Structure):
    _fields_ = [
        ("ptr", POINTER(c_wchar_p)),
        ("size", c_long),
    ]

lib = to_shared_lib('convertor')
lib_file = pkg_resources.resource_filename(__name__, lib)
utils = ctypes.PyDLL(lib_file)

str_to_char_arr = utils.str_to_char_arr
str_to_char_arr.restype = Vec()
str_to_char_arr.argtypes = [py_object]

encoded = str_to_char_arr('abc'.encode('utf-8'))

char_arr_to_str = utils.char_arr_to_str
char_arr_to_str.restype = py_object
char_arr_to_str.argtypes = [py_object.ctype_class]
result = ctypes.cast(encoded, ctypes.POINTER(Vec())).contents

decoded = char_arr_to_str(result).decode('utf-8')

Trying this with 'abc' on python 3.5 seems to yield '\x03\x00\x00' which clearly means something went wrong.

Can anyone spot the issue?

Dimebag
  • 833
  • 2
  • 9
  • 29

2 Answers2

0

It might be that you expect UCS2 and the Python is configured for UCS4. See also Building an UCS4 string buffer in python 2.7 ctypes

Andrey Belykh
  • 2,578
  • 4
  • 32
  • 46
0

Haven't managed to make this work for Python 2; perhaps someone understands the unicode/str/bytes differences better between the Python versions to fix this. Also this means the issue I have is probably with another package which unfortunately I have no control of atm.

Nevertheless, here is some working code (for me) with Python 3.5 and clang 6.0.

#include "Python.h"

#if PY_MAJOR_VERSION >= 3
    #define PyString_Size PyBytes_Size
    #define PyString_AsString PyBytes_AsString
    #define PyString_FromStringAndSize PyBytes_FromStringAndSize
#endif

template<typename T>
struct vec {
  T *ptr;
  int64_t size;
};

extern "C"
vec<uint8_t> str_to_char_arr(PyObject* in) {
  int64_t dimension = (int64_t) PyString_Size(in);
  vec<uint8_t> t;
  t.size = dimension;
  t.ptr = (uint8_t*) PyString_AsString(in);
  return t;
}

extern "C"
PyObject* char_arr_to_str(vec<uint8_t> inp) {
  Py_Initialize();
  PyObject* buffer = PyString_FromStringAndSize((const char*) inp.ptr, inp.size);
  return buffer;
}


# Python
from ctypes import *

import pkg_resources


class Vec(Structure):
    _fields_ = [
        ("ptr", POINTER(c_char_p)),
        ("size", c_long),
    ]


lib = 'test.so'
lib_file = pkg_resources.resource_filename(__name__, lib)
utils = PyDLL(lib_file)

str_to_char_arr = utils.str_to_char_arr
str_to_char_arr.restype = Vec
str_to_char_arr.argtypes = [py_object]

encoded = str_to_char_arr('Bürgermeister'.encode('utf-8'))

char_arr_to_str = utils.char_arr_to_str
char_arr_to_str.restype = py_object
char_arr_to_str.argtypes = [Vec]

decoded = char_arr_to_str(encoded).decode('utf-8')
print(decoded)  # Bürgermeister

Changing c_char_p to c_wchar_p seems to have no effect(?). Still works.

Dimebag
  • 833
  • 2
  • 9
  • 29