I'm trying to use a very large numpy array using numpy memmap, accessing each element as a ctypes Structure.
class My_Structure(Structure):
_fields_ = [('field1', c_uint32, 3),
('field2', c_uint32, 2),
('field3', c_uint32, 2),
('field4', c_uint32, 9),
('field5', c_uint32, 12),
('field6', c_uint32, 2),
('field7', c_uint32, 2)]
def __str__(self):
return f'MyStruct -- f1{self.field1} f2{self.field2} f3{self.field3} f3{self.field4} f5{self.field5} f6{self.field6} f7{self.field7}'
def __eq__(self, other):
for field in self._fields_:
if getattr(self, field[0]) != getattr(other, field[0]):
return False
return True
_big_array = np.memmap(filename = 'big_file.data',
dtype = 'uint32',
mode = 'w+',
shape = 1000000
)
big_array = _big_array.ctypes.data_as(ctypes.POINTER(My_Structure))
big_array[0].field1 = 5
...
And it seems to work correctly, but I'm getting an fault on a 64bit Windows machine where python.exe simply stops. In Event Viewer, I see that the faulting module name is _ctypes.pyd
and the exception code is 0xc0000005 which I believe is an access exception.
I don't seem to be getting the same error on Linux, though my testing has not been thorough.
My questions are:
Does my access look correct; ie. am I using
numpy.memmap.ctypes.data_as
correctly?Does the fact that I have functions (
__str__
and__eq__
) defined onMy_Structure
change its size? ie. can it still be used in the array as auint32
?Is there anything that you think might cause this behavior? Particularly considering the differences between Windows and Linux?
EDIT:
Using
ctypes.addressof
andctypes.sizeof
on big_array elements, it looks like the__str__
and__eq__
do not impact the size ofMy_Structure
I added some asserts before my access to
big_array
and found that I was attempting to accessbig_array[-1]
, which explains the access error and crash.
Which leaves question 1 open: It looks like my code is technically correct, but I'm wondering if there is a better way to access the numpy array than using a ctypes.pointer so that I still get the benefits of using a numpy array (out-of-bound access warning, negative index wrapping, etc.). Daniel below suggested using a structured numpy array, but is it possible to do bitfield access with this?