3

I am writing a Python3 extension module for an existing C++ library which returns a string that appears to be in cp1252 encoding. The C++ function signature is

int get_name(std::string& name);

where name is the output variable that contains a string with c_str() contents like 0xb04600, which is DegreeSymbol in cp1252 code page, followed by upper case F, completed by the NULL character.

In my python extension C++ code, I wrote

std::string name;
int retval = get_value(name);
py_retval = Py_BuildValue((char *) "is#", retval, (name).c_str(), (name).size());

However, this causes the following runtime exception

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb0 in position 0: invalid start byte

What is the correct way for me to return a cp2152 encoded string into python?

UPDATE I figured out that if I use y# instead of s# to return a Python bytes object from the extension, then I can convert that bytes object back to a string in my python code with .decode('cp1252'). However, this is an extra step in Python that should be automated in the extension module. Unfortunately, I cannot figure out how

Anger Density
  • 332
  • 1
  • 3
  • 17
Paul Grinberg
  • 1,184
  • 14
  • 37
  • I don't know enough about the Python C API to give you the code, but can't you build a `bytes` like you said and then call `.decode` *from C*, through the Python API, to get your final Python retvalue? – Linuxios Aug 01 '19 at 20:56
  • @Linuxios - that's exactly what I want to do (see my Update), but I cannot figure out the Python Extension Module C syntax for doing that. – Paul Grinberg Aug 01 '19 at 20:58
  • Might help: https://stackoverflow.com/a/3310608/1008938 and https://docs.python.org/3/c-api/ – Linuxios Aug 01 '19 at 21:01
  • Maybe write simple python wrapper around this module which only will be changing encoding? – Grzegorz Bokota Aug 01 '19 at 22:06

1 Answers1

2

PyUnicode_Decode can do this job for any standard encoding without even having to make a bytes object first. (You can pass it with code N to Py_BuildValue to avoid worrying with reference counts, although that trick doesn’t apply in all cases.)

Davis Herring
  • 36,443
  • 4
  • 48
  • 76