2

I have the following classes of a ctypes Structure.

import ctypes as ct

class my_array(ct.Structure):
    _fields_ = [("_data", ct.POINTER(ct.c_uint32)),
                ("_size", ct.c_size_t)]

    def __init__(self, data):
        self._data = (ct.c_uint32 * len(data))()
        for i in range(0, len(data)):
            self._data[i] = data[i]
            self._size = len(data)

    def __reduce__(self):
        data = [None]*self._size
        for j in xrange(0, self._size):
            data[j] = self._data[j]
        return (my_array, (data,))


class my_struct(ct.Structure):
    _fields_ = [("numbs", my_array)]

However, when I pickle the second object

import cPickle
a = my_array([1, 2, 10])
b = my_struct(a)

with open('myfile', 'wb') as f:
    cPickle.dump(a, f)  # This succeeds
    cPickle.dump(b, f)  # This fails

I get the exception

Traceback (most recent call last):
  File "tmp.py", line 30, in <module>
    cPickle.dump(a, f)  # This fails
ValueError: ctypes objects containing pointers cannot be pickled

I don't understand why this is happening since I implemented __reduce__ in my_array? Implementing __getstate__ did not work either.

I know I can overload __reduce__ again in my_struct but this seems overly complicated for me since then I have to keep overloading __reduce__ everytime I include my_array in a structure.

Tohiko
  • 1,860
  • 2
  • 18
  • 26
  • I think you need to dereference that pointer: https://stackoverflow.com/questions/1555944/how-to-dereference-a-memory-location-from-python-ctypes – Dan D. Apr 06 '18 at 14:57
  • Do you mean when pickling? My `__getstate__` has no problems (if I call it, I get the correct results). The problem is that the pickling method is not calling my custom method at all, instead throwing this exception. – Tohiko Apr 06 '18 at 15:05
  • 1
    I can look at this when I’m not on a phone, but maybe you can test faster: the pickle protocol for classes is a complicated pile of “See if there’s a function in the registry; if not, try this method; if that fails, try this other one; if that fails, check if these two both exist and call the first one; etc.” So if `ctypes.Structure` has support for one of the earlier-checked pickling methods, you need to override that (or something even earlier) or your code is never going to get called. Read the docs on the pickle module to see exactly what the rule is for what gets tried in what order. – abarnert Apr 06 '18 at 15:23
  • Also, can you post the entire traceback? It should actually be telling you which pickling method was being called, which will immediately tell you which options you have for overriding it without diving into the ctypes code or experimenting trial-and-error. – abarnert Apr 06 '18 at 15:25
  • 1
    @abarnert, using `__reduce__` I figured out how to fix to pickle `my_array`, but now I have a problem when I include this structure in another one. I updated my question to reflect this. – Tohiko Apr 06 '18 at 15:37
  • I think this new question is different enough that you should have considered writing and accepting an answer to the original question, then posting a new one, instead of editing. (And you should still consider reverting and doing that.) It depends on whether you think other people are likely to run into your original issue (and whether you want a bit of rep points from having a good question and a good answer…). – abarnert Apr 06 '18 at 16:52
  • Meanwhile, for your new question: this problem screams “inheritance”. Can you make a `class PicklableStructure(ctypes.Structure):` with a `__reduce__` that recursively reduces all of its fields that are `PicklableStructure` instances? That might turn out to be less trivial than it sounds, or even a bad idea or impossible, but it’s the second thing I’d look at. – abarnert Apr 06 '18 at 16:55
  • The _first_ thing I’d look at is whether you can just use `dill` (if this is for storage) or `cloudpickle` (if it’s for distributed processing), and, if so, whether it automatically solves this problem for you. (Also: I don’t know if either of those projects has dropped 2.x compatibility yet. You seem to be using 2.x, based on the `cPickle`, and that could be relevant to what options are available, so you should add the python-2.7 tag.) – abarnert Apr 06 '18 at 16:56

1 Answers1

2

Necroposting.

I ran into the same situation investigating another issue (using Python 3).

Listing:

A pointer is a (starting) memory address where data (a number of bytes) may (or may not) be stored. Typically, when used in structure members, a pointer is used to store an array of elements (element represents the pointed type) just like in your example because:

  1. Avoids array limitations

  2. Flexibility: possibility to use the same structure (instance) for multiple numbers of elements (partially overlaps with #1.)

But, a pointer doesn't hold / have information about how much memory the stored data occupies (some could argue it's the pointed type's sizeof, but that is more like a hint).
So, in order to be able to get the right amount of bytes (in order to pickle (or do whatever with) it) from the pointer address, the data size must also be retrieved from an external (current pointer wise) source (in our case from another structure member).
If the size is not available, the fallback would be (as stated above) the pointed type's sizeof, but there might be cases when that wouldn't be correct. Things get even more complicated when structures are nested (which is pretty often). Also, data size might be expressed in bytes or elements.

So, there is no generic way of knowing how many bytes are stored at one address, and that's precisely why (CTypes) pointers can't be pickled by default (otherwise there might be cases of accessing "forbidden" memory which is Undefined Behavior and might trigger SegFault (Access Violation)).
As a consequence, structures (unions, or any other container types) containing pointers should define their own pickling strategy (depending on how they store / interpret data).

I created a small example.

code00.py:

#!/usr/bin/env python

import ctypes as ct
import pickle
import sys


def __reduce__(self):
    #print("__reduce__")
    state = []
    ptr_size = None
    for name, typ in self._fields_:
        val = getattr(self, name)
        if issubclass(typ, ct._Pointer):
            state.append(val[:ptr_size])
            ptr_size = None
        else:
            state.append(val)
            ptr_size = val
    #print("pickle state:", state)
    return (self.__class__, (), state)


def __setstate__(self, state):
    #print("__setstate__")
    for idx, (name, typ) in enumerate(self._fields_):
        if issubclass(typ, ct._Pointer):
            val = state[idx]
            setattr(self, name, (typ._type_ * len(val))(*val))
        else:
            setattr(self, name, state[idx])


def to_string(self, indent=0, head=True, tail=True, indent_text="  "):
    l = [""] if head else []
    l.append("{:s}{:s}".format(indent_text * indent, str(self)))
    i1 = indent_text * (indent + 1)
    ptr_size = None
    for name, typ in self._fields_:
        val = getattr(self, name)
        inner = getattr(val, to_string.__name__, None)
        if callable(inner):
            l.append("{:s}{:s}: {:}".format(i1, name, inner(indent=indent + 1, head=False, tail=False, indent_text=indent_text)))
        elif issubclass(typ, ct._Pointer):
            l.append("{:s}{:s} ({:}): ({:s})".format(i1, name, val, ", ".join(str(val[e]) for e in range(ptr_size))))
            ptr_size = None
        else:
            l.append("{:s}{:s}: {:}".format(i1, name, val))
            ptr_size = val
    if tail:
        l.append("")
    return "\n".join(l)


FloatPtr = ct.POINTER(ct.c_float)
StrPtr = ct.POINTER(ct.c_char_p)


class Struct0(ct.Structure):
    _fields_ = (
        ("float_size", ct.c_uint),
        ("float_data", FloatPtr),
        ("str_size", ct.c_uint),
        ("str_data", StrPtr),
    )

    '''
    def __getstate__(self):
        print("__getstate__")
        return ""
    '''

Struct0.__reduce__ = __reduce__
Struct0.__setstate__ = __setstate__
Struct0.to_string = to_string


class Struct1(ct.Structure):
    _fields_ = (
        ("struct0", Struct0),
        ("i", ct.c_int),
    )

Struct1.__reduce__ = __reduce__
Struct1.__setstate__ = __setstate__
Struct1.to_string = to_string


def main(*argv):
    floats = (
        3.141593,
        2.718282,
        1.618,
        -1,
    )
    strs = (
        "dummy",
        "",
        "stupid",
        "text",
        "",
    )

    s00 = Struct0()
    s00.float_size = len(floats)
    s00.float_data = (ct.c_float * s00.float_size)(*floats)
    s00.str_size = len(strs)
    s00.str_data = (ct.c_char_p * s00.str_size)(*(ct.c_char_p(e.encode()) for e in strs))

    print("\nORIGINAL:", s00.to_string())
    s00p = pickle.dumps(s00)
    print("PICKLED:\n", s00p)
    s01 = pickle.loads(s00p)
    print("\nUNPICKLED:", s01.to_string())

    #print(dir(s00) == dir(s01), s00.__dict__ == s01.__dict__, s00 == s01, Struct0() == Struct0())

    s10 = Struct1()
    s10.struct0 = s00
    s10.i = -69
    print("\nORIGINAL:", s10.to_string())
    s10p = pickle.dumps(s10)
    print("PICKLED:\n", s10p)
    s11 = pickle.loads(s10p)
    print("\nUNPICKLED:", s11.to_string())


if __name__ == "__main__":
    print("Python {:s} {:03d}bit on {:s}\n".format(" ".join(elem.strip() for elem in sys.version.split("\n")),
                                                   64 if sys.maxsize > 0x100000000 else 32, sys.platform))
    rc = main(*sys.argv[1:])
    print("\nDone.")
    sys.exit(rc)

Notes:

  • I created the required methods (__reduce__ and __setstate__ - as pickling makes no sense without unpickling), and an additional one (to_string - to display the structure nicely). They use the same fields "browsing" mechanism

  • The code might look a bit too complicated, but I wanted to avoid hardocding member names or types, so that if those change, the methods still work

    • But I did hardcoded (although it's not a hardcoding per se) the structure's state: the code heavily relies on the fact that the member holding the size (in elements) comes right before the pointer one, so if the structure's structure changes (I assumed that would be less likely), it will no longer work
  • Since I used the same method implementations in both my structures, I defined them as functions, and "transformed" them to methods after each structure definition

    • Conversely, they only handle structure types like these 2, meaning that there are functionalities (e.g.: arrays, pointers to structures, or others that I didn't think about) which are currently not supported. It shouldn't be very hard to add them, but I didn't want to complicate the code even more
  • Bear in mind that converting the data back and forth is costly, so if done often, any speed improvement gained by CTypes usage (which is its main advantage), will be seriously affected

Output:

[cfati@CFATI-5510-0:e:\Work\Dev\StackOverflow\q049694832]> "e:\Work\Dev\VEnvs\py_pc064_03.09_test0\Scripts\python.exe" ./code00.py
Python 3.9.9 (tags/v3.9.9:ccb0e6a, Nov 15 2021, 18:08:50) [MSC v.1929 64 bit (AMD64)] 064bit on win32


ORIGINAL:
<__main__.Struct0 object at 0x00000191331E40C0>
  float_size: 4
  float_data (<__main__.LP_c_float object at 0x00000191349C5140>): (3.1415929794311523, 2.7182819843292236, 1.6180000305175781, -1.0)
  str_size: 5
  str_data (<__main__.LP_c_char_p object at 0x00000191349C5140>): (b'dummy', b'', b'stupid', b'text', b'')

PICKLED:
 b'\x80\x04\x95m\x00\x00\x00\x00\x00\x00\x00\x8c\x08__main__\x94\x8c\x07Struct0\x94\x93\x94)R\x94]\x94(K\x04]\x94(G@\t!\xfb\x80\x00\x00\x00G@\x05\xbf\n\xa0\x00\x00\x00G?\xf9\xe3T\x00\x00\x00\x00G\xbf\xf0\x00\x00\x00\x00\x00\x00eK\x05]\x94(C\x05dummy\x94C\x00\x94C\x06stupid\x94C\x04text\x94h\x08eeb.'

UNPICKLED:
<__main__.Struct0 object at 0x00000191349C5140>
  float_size: 4
  float_data (<__main__.LP_c_float object at 0x00000191349C5240>): (3.1415929794311523, 2.7182819843292236, 1.6180000305175781, -1.0)
  str_size: 5
  str_data (<__main__.LP_c_char_p object at 0x00000191349C5240>): (b'dummy', b'', b'stupid', b'text', b'')


ORIGINAL:
<__main__.Struct1 object at 0x00000191349C5240>
  struct0:   <__main__.Struct0 object at 0x00000191349C53C0>
    float_size: 4
    float_data (<__main__.LP_c_float object at 0x00000191349C5440>): (3.1415929794311523, 2.7182819843292236, 1.6180000305175781, -1.0)
    str_size: 5
    str_data (<__main__.LP_c_char_p object at 0x00000191349C5440>): (b'dummy', b'', b'stupid', b'text', b'')
  i: -69

PICKLED:
 b'\x80\x04\x95\x88\x00\x00\x00\x00\x00\x00\x00\x8c\x08__main__\x94\x8c\x07Struct1\x94\x93\x94)R\x94]\x94(h\x00\x8c\x07Struct0\x94\x93\x94)R\x94]\x94(K\x04]\x94(G@\t!\xfb\x80\x00\x00\x00G@\x05\xbf\n\xa0\x00\x00\x00G?\xf9\xe3T\x00\x00\x00\x00G\xbf\xf0\x00\x00\x00\x00\x00\x00eK\x05]\x94(C\x05dummy\x94C\x00\x94C\x06stupid\x94C\x04text\x94h\x0ceebJ\xbb\xff\xff\xffeb.'

UNPICKLED:
<__main__.Struct1 object at 0x00000191349C5440>
  struct0:   <__main__.Struct0 object at 0x00000191349C53C0>
    float_size: 4
    float_data (<__main__.LP_c_float object at 0x00000191349C54C0>): (3.1415929794311523, 2.7182819843292236, 1.6180000305175781, -1.0)
    str_size: 5
    str_data (<__main__.LP_c_char_p object at 0x00000191349C54C0>): (b'dummy', b'', b'stupid', b'text', b'')
  i: -69


Done.
CristiFati
  • 38,250
  • 9
  • 50
  • 87