3

I have to read an array of C structs returned from a dll and convert it to a Numpy array. The code uses Python's cffi module.

The code works so far but I don't know how to handle the member padding in the struct that np.frombuffer complains about:

ValueError: buffer size must be a multiple of element size

This is my code:

from cffi import FFI
import numpy as np

s = '''
    typedef struct
    {
        int a;
        int b;
        float c;
        double d;
    } mystruct;
    '''

ffi = FFI()
ffi.cdef(s)

res = []

#create array and fill with dummy data
for k in range(2):

    m = ffi.new("mystruct *")

    m.a = k
    m.b = k + 1
    m.c = k + 2.0
    m.d = k + 3.0

res.append(m[0])

m_arr = ffi.new("mystruct[]", res)

print(m_arr)

# dtype for structured array in Numpy
dt = [('a', 'i4'),
      ('b', 'i4'),
      ('c', 'f4'),
      ('d', 'f8')]

# member size, 20 bytes
print('size, manually', 4 + 4 + 4 + 8)

# total size of struct, 24 bytes
print('sizeof', ffi.sizeof(m_arr[0]))

#reason is member padding in structs

buf = ffi.buffer(m_arr)
print(buf)

x = np.frombuffer(buf, dtype=dt)
print(x)

Any ideas how to handle this in a clean way?


Edit:

It seems to work if I add an additional number to the dtype where the padding is supposed to happen:

dt = [('a', 'i4'),
      ('b', 'i4'),
      ('c', 'f4'),
      ('pad', 'f4'),
      ('d', 'f8')]

Why does the padding happen there? (Win7, 64-bit, Python 3.4 64-bit).

But that can't be the best way. The real code is much more complicated and dynamic, so it should be possible to handle this somehow, right?

Joe
  • 6,758
  • 2
  • 26
  • 47
  • I found the line "In general, a struct instance will have the alignment of its widest scalar member." (http://www.catb.org/esr/structure-packing/#_structure_reordering) But later they mention it is possible to re-order a struct. But why does this work if every member takes the same space? – Joe Jan 24 '18 at 14:17
  • Most-likely, as you mentioned yourself, the C struct is padded. For what this actually means, check out [this post](https://stackoverflow.com/a/4306269/3996454). Therefore, when you define your dt as mentioned in your edit, you align to the padded struct by inserting a "gap" yourself. The programmer of the DLL would need to pack the struct in order for you to use your original dt definition. – TacoVox Jan 24 '18 at 14:23
  • Your member `d` appears to be a 64-bit floating-point number (presumably a C `double`). The layout you present in your edit ensures that that member is aligned on an 8-byte boundary whenever the whole structure is likewise aligned, and also ensures that adjacent array elements of that type can both be 8-byte aligned. – John Bollinger Jan 24 '18 at 14:40
  • But if, as usually advised, the largest members are put first, there would still be a trailing padding. If I got the concept, then if the sorted order is `double, int, int, float`, there are still four bytes to fill the gap to a multiple of double. Is that correct? – Joe Jan 24 '18 at 14:43
  • Yes, Joe, that is correct. Reordering the elements *might* avoid padding between them, but if the compiler is laying out the structure to ensure that the `double` can be aligned on an 8-byte boundary then it must require that the whole structure have an 8-byte (at least) alignment requirement. And the size of a type's representation must always be a multiple of its alignment requirement, else you could not form arrays of that type. – John Bollinger Jan 24 '18 at 14:48
  • Your compiler (MSVC++?) probably has an extension that allows you to request tighter packing, possibly without padding, sacrificing optimal alignment. Some compilers spell that `#pragma pack` or similar, but do look up the details before trying to use such a thing. How that might affect the rest of the stack is difficult for me to say. – John Bollinger Jan 24 '18 at 14:52

2 Answers2

2

The probably most convenient way is to use the keyword align=True in the numpy dtype constructor. That will do the padding automatically.

dt = [('a', 'i4'),
      ('b', 'i4'),
      ('c', 'f4'),
      ('d', 'f8')]

dt_obj = np.dtype(dt, align=True)
x = np.frombuffer(buf, dtype=dt_obj)

(see also Numpy doc on structured arrays)

Fomalhaut
  • 8,590
  • 8
  • 51
  • 95
Joe
  • 6,758
  • 2
  • 26
  • 47
1

In addition to the other answers given in the comments, you can force cffi to pack its structs (i.e. not insert any padding, similar to what you can do using specific C compiler extensions):

ffi.cdef("typedef struct { char a; int b; } foo_t;", packed=True)
Armin Rigo
  • 12,048
  • 37
  • 48