0

I have the following cython code in a cdef object:

def __getstate__(self):

    cdef char *bp
    cdef size_t size
    cdef cn.FILE *stream

    stream = cn.open_memstream(&bp, &size)

    cn.write_padded_binary(self.im, self.n, 256, stream)
    cn.fflush(stream);

    cn.fclose(stream)

    print("pointer", bp, "size_t:", size)
    # ('pointer', b'', 'size_t:', 6144)
    bt = c.string_at(bp, size)
    print("bt", bt)

    cn.free(bp)

    return bt

However, the pointer printed in print("pointer", bp, "size_t:", size) and the bytestring that is printed in print("bt", bt) makes me worried that something is wrong. The pointer is just ('pointer', b'', 'size_t:', 6144) and the bytestring seems to contain text from Python source code:

x00\x00 Normalize an encoding name.\n\n Normalization works as follows: all non-alphanumeric\n characters except the dot used for Python package names are\n collapsed and replaced with a single underscore, e.g. \' -;#\'\n becomes \'_\'. Leading and trailing underscores are removed.\n\n Note that encoding names should be ASCII only; if they do use\n non-ASCII characters, these must be Latin-1 compatible.\n\n \x00\x00\

(It's mostly just byte-symbols though).

I am sure the write_padded_binary_works, because it works when I give it a regular file descriptor. I am also sure open_memstream works because when I try it with cn.fprintf(stream, "hello"); instead of the write_padded_binary the output is ('bt', b'hello'). However, the pointer is also ('pointer', b'hello', 'size_t:', 5) so I must be misunderstanding something pointer-related I think...

The Unfun Cat
  • 29,987
  • 31
  • 114
  • 156

1 Answers1

1

The issue you're having (diagnosed elsewhere) is that you can't pass a char* directly to Python functions. When you do Cython attempts to convert it into a string (which doesn't make sense because it's just holding binary data, so interpreting it as a null terminated C string causes it to read an arbitrary length until it finds a 0.

This cases issues with both print and ctypes.string_at. The trick in both cases is to cast it to an appropriately sized integer first. The C uintptr_t is guaranteed to be large enough to hold an integer, so is the appropriate choice:

from libc.stdint cimport uintptr_t

print("pointer", <uintptr_t>bp, "size_t:", size)
bt = c.string_at(<uintptr_t>bp, size)
DavidW
  • 29,336
  • 6
  • 55
  • 86