How to handle member padding in C struct when reading cffi.buffer with numpy.frombuffer?

Question

I have to read an array of C structs returned from a dll and convert it to a Numpy array. The code uses Python's cffi module.

The code works so far but I don't know how to handle the member padding in the struct that np.frombuffer complains about:

ValueError: buffer size must be a multiple of element size

This is my code:

from cffi import FFI
import numpy as np

s = '''
    typedef struct
    {
        int a;
        int b;
        float c;
        double d;
    } mystruct;
    '''

ffi = FFI()
ffi.cdef(s)

res = []

#create array and fill with dummy data
for k in range(2):

    m = ffi.new("mystruct *")

    m.a = k
    m.b = k + 1
    m.c = k + 2.0
    m.d = k + 3.0

res.append(m[0])

m_arr = ffi.new("mystruct[]", res)

print(m_arr)

# dtype for structured array in Numpy
dt = [('a', 'i4'),
      ('b', 'i4'),
      ('c', 'f4'),
      ('d', 'f8')]

# member size, 20 bytes
print('size, manually', 4 + 4 + 4 + 8)

# total size of struct, 24 bytes
print('sizeof', ffi.sizeof(m_arr[0]))

#reason is member padding in structs

buf = ffi.buffer(m_arr)
print(buf)

x = np.frombuffer(buf, dtype=dt)
print(x)

Any ideas how to handle this in a clean way?

Edit:

It seems to work if I add an additional number to the dtype where the padding is supposed to happen:

dt = [('a', 'i4'),
      ('b', 'i4'),
      ('c', 'f4'),
      ('pad', 'f4'),
      ('d', 'f8')]

Why does the padding happen there? (Win7, 64-bit, Python 3.4 64-bit).

But that can't be the best way. The real code is much more complicated and dynamic, so it should be possible to handle this somehow, right?

I found the line "In general, a struct instance will have the alignment of its widest scalar member." (http://www.catb.org/esr/structure-packing/#_structure_reordering) But later they mention it is possible to re-order a struct. But why does this work if every member takes the same space? — Joe, Jan 24 '18 at 14:17
Most-likely, as you mentioned yourself, the C struct is padded. For what this actually means, check out [this post](https://stackoverflow.com/a/4306269/3996454). Therefore, when you define your dt as mentioned in your edit, you align to the padded struct by inserting a "gap" yourself. The programmer of the DLL would need to pack the struct in order for you to use your original dt definition. — TacoVox, Jan 24 '18 at 14:23
Your member `d` appears to be a 64-bit floating-point number (presumably a C `double`). The layout you present in your edit ensures that that member is aligned on an 8-byte boundary whenever the whole structure is likewise aligned, and also ensures that adjacent array elements of that type can both be 8-byte aligned. — John Bollinger, Jan 24 '18 at 14:40
But if, as usually advised, the largest members are put first, there would still be a trailing padding. If I got the concept, then if the sorted order is `double, int, int, float`, there are still four bytes to fill the gap to a multiple of double. Is that correct? — Joe, Jan 24 '18 at 14:43
Yes, Joe, that is correct. Reordering the elements *might* avoid padding between them, but if the compiler is laying out the structure to ensure that the `double` can be aligned on an 8-byte boundary then it must require that the whole structure have an 8-byte (at least) alignment requirement. And the size of a type's representation must always be a multiple of its alignment requirement, else you could not form arrays of that type. — John Bollinger, Jan 24 '18 at 14:48
Your compiler (MSVC++?) probably has an extension that allows you to request tighter packing, possibly without padding, sacrificing optimal alignment. Some compilers spell that `#pragma pack` or similar, but do look up the details before trying to use such a thing. How that might affect the rest of the stack is difficult for me to say. — John Bollinger, Jan 24 '18 at 14:52

score 2 · Accepted Answer · edited Oct 11 '21 at 08:46

2

The probably most convenient way is to use the keyword align=True in the numpy dtype constructor. That will do the padding automatically.

dt = [('a', 'i4'),
      ('b', 'i4'),
      ('c', 'f4'),
      ('d', 'f8')]

dt_obj = np.dtype(dt, align=True)
x = np.frombuffer(buf, dtype=dt_obj)

(see also Numpy doc on structured arrays)

edited Oct 11 '21 at 08:46

Fomalhaut

8,590
8
51
95

answered Feb 08 '18 at 13:59

Joe

6,758
2
26
47

score 1 · Answer 2 · answered Jan 24 '18 at 20:55

In addition to the other answers given in the comments, you can force cffi to pack its structs (i.e. not insert any padding, similar to what you can do using specific C compiler extensions):

ffi.cdef("typedef struct { char a; int b; } foo_t;", packed=True)

How to handle member padding in C struct when reading cffi.buffer with numpy.frombuffer?

Edit:

2 Answers2