1

I'm using a binary Python library that returns a Buffer object. This object is basically a wrapper of a C object containing a pointer to the actual memory buffer. What I need is to get the memory address contained in that pointer from Python, the problem is that the Buffer object doesn't have a Python method to obtain it, so I need to do some hacky trick to get it.

For the moment I found an ugly and unsafe way to get the pointer value:

I know the internal structure of the C object:

typedef struct _Buffer {
  PyObject_VAR_HEAD PyObject *parent;

  int type; /* GL_BYTE, GL_SHORT, GL_INT, GL_FLOAT */
  int ndimensions;
  int *dimensions;

  union {
    char *asbyte;
    short *asshort;
    int *asint;
    float *asfloat;
    double *asdouble;

    void *asvoid;
  } buf;
} Buffer;

So I wrote this Python code:

# + PyObject_VAR_HEAD size
# + 8 bytes PyObject_VAR_HEAD PyObject *parent
# + 4 bytes from int type
# + 4 bytes from int ndimensions
# + 8 bytes from int *dimensions
# = 24
offset = sys.getsizeof(0) + 24

buffer_pointer_addr = id(buffer) + offset
buffer_pointer_data = ctypes.string_at(buffer_pointer_addr, 8)
buffer_pointer_value = struct.unpack('Q', buffer_pointer_data)[0]

This is working consistently for me. As you can see I'm getting the memory address of the Python Buffer object with id(buffer), but as you may know that's not the actual pointer to the buffer, but just a Python number that in CPython happens to be the memory address to the Python object.

So then I'm adding the offset that I calculated by adding the sizes of all the variables in the C struct. I'm hardcoding the byte sizes (which is obviously completely unsafe) except for the PyObject_VAR_HEAD, that I get with sys.getsizeof(0).

By adding the offset I get the memory address that contains the pointer to the actual buffer, then I use ctypes to extract it with ctypes.string_at hardcoding the size of the pointer as 8 bytes (I'm on a 64bit OS), then I use struct.unpack to convert it to an actual Python int.

So now my question is: how could I implement a safer solution without hardcoding all the sizes? (if it exists). Maybe something with ctypes? It's OK if it only works on CPython.

ciclopez
  • 128
  • 1
  • 8
  • If you only need to support CPython, then using `id()` is ok. See: https://stackoverflow.com/questions/121396/accessing-object-memory-address – Marco Bonelli Oct 22 '20 at 16:30
  • Thanks, yes I guessed that. What about getting the sizes of all those C variable types instead of hardcoding them? – ciclopez Oct 22 '20 at 16:44
  • 2
    Saw `short *asshort;` and had to laugh and ask "What kind of _hort_ "? ;-) – chux - Reinstate Monica Oct 22 '20 at 16:44
  • 1
    @ciclopez you can't really do much about the offset. Even knowing the sizes there can be padding between structure fields. You can use `sizeof` from `ctypes` for the sizes of certain data types. – Marco Bonelli Oct 22 '20 at 16:47
  • 1
    I really want to know why you need the pointer from Python code. – nategoose Oct 22 '20 at 16:57
  • @MarcoBonelli isn't there some way of accessing the actual C Object variables from Python by using ctypes? – ciclopez Oct 22 '20 at 17:10
  • @ciclopez how? Once a C program is compiled, you lose that information. You don't have information on the types that compose a struct just by "looking" at it. You might manage to do something with debugging information embedded in the binary file (if any, which I doubt), but that's a long and painful process. – Marco Bonelli Oct 22 '20 at 17:12
  • @nategoose Basically I want to compress a sequence of images really fast and they are generated by a software that spits them as the mentioned Buffer object, so I need to pass this buffer to another library that I made in Cython, but don't want to copy the buffer as it's very inefficient, specially if it has to be converted to another Python object (they are big images at a high frame rate), so I'm passing the pointer and working with that. – ciclopez Oct 22 '20 at 17:20
  • @ciclopez: Why not just make the Cython library know how to process the struct _Buffer wrapper object? That seems like the cleanest way to do this. – nategoose Oct 22 '20 at 17:23
  • @nategoose It's a bit more complicated than what I explained, I tried to simplify it. I can't do that because the library I'm doing is intended to work as a plug-in for different softwares, and each software generates the image buffers in different ways. In some softwares I have direct access to the buffer pointer, but in the one I'm dealing with now I have this problem. This Buffer object has a method to return the values as a Python list, but that is extremely inefficient, so I'm trying to get the pointer. – ciclopez Oct 22 '20 at 17:29
  • @MarcoBonelli I finally found a safer solution with your hint about `ctypes.sizeof()` and investigating about C Struct padding, I posted an answer. Thank you. – ciclopez Oct 23 '20 at 13:42

1 Answers1

1

I found a safer solution after investigating about C Struct padding and based on the following assumptions:

  • The code will only be used on CPython.
  • The buffer pointer is at the end of the C Struct.
  • The buffer pointer size can be safely extracted from void * C-type as it's going to be the biggest of the union{} made in the C struct. Anyway there will be no different sizes between data pointer types on most modern OS's.
  • The C Struct members are going to be exactly the ones shown in the question

Based on all these assumptions and the rules found here: https://stackoverflow.com/a/38144117/8861787, we can safely say that there will be no padding at the end of the struct and we can extract the pointer without hardcoding anything:

# Get the size of the Buffer Python object
buffer_obj_size = sys.getsizeof(buffer)

# Get the size of void * C-type
buffer_pointer_size = ctypes.sizeof(ctypes.c_void_p)

# Calculate the address to the pointer assuming that it's at the end of the C Struct
buffer_pointer_addr = id(buffer) + buffer_obj_size - buffer_pointer_size

# Get the actual pointer value as a Python Int
buffer_pointer_value = (ctypes.c_void_p).from_address(buffer_pointer_addr).value
ciclopez
  • 128
  • 1
  • 8