Python ctypes struct with flexible array member to bytearray

Question

I'm writing a Python application that will write a binary file. This file will be parsed by some C code running on an embedded target.

I'm confident that I could do this by deriving from the Struct class, but the packing formatting is awful, and all my struct as little-endian anyways, so I thought of using the ctypes package.

Let's say that I have the following C structure:

struct my_c_struct
{
    uint32_t    a;
    uint16_t    b;
    uint16_t    table[];
};

On the C side, I operate on that structure using a pointer cast to a memory buffer, so I can do:

uint8_t buf[128];
struct my_c_struct *p = (struct my_c_struct*) buf;
p->table[0] = 0xBEEF;

How to best represent this in Python? My first go at it is:

class MyCStruct(ctypes.LittleEndianStructure):

    c_uint32 = ctypes.c_uint32
    c_uint16 = ctypes.c_uint16
    
    _pack_ = 1

    _fields_ = [
        ("a", c_uint32),
        ("b", c_uint16),
    ]

    def __init__(self, a, b):
        """
        Constructor
        """
        super(ctypes.LittleEndianStructure, self).__init__(a, b)
        self.table = []

    def pack(self):
        data = bytearray(self.table)
        return bytearray(self)+data

The idea behind the pack() method is that it'll assemble the variable-length table at the end of the structure. Mind that I don't know how many entries table has at object creation.

The way I implemented it obviously doesn't work. So I was thinking about nesting the ctypes-devived class in a pure Python class:

class MyCStruct:

    class my_c_struct(ctypes.LittleEndianStructure):
        _pack_ = 1
        _fields_ = [ ("a", ctypes.c_uint32),
                     ("b", ctypes.c_uint16) ]


    def __init__(self, a, b):
        """
        Constructor
        """
        self.c_struct = self.my_c_struct(a,b)
        self.table = []
    
    def pack(self):
        self.c_struct.b = len(self.table)
        x = bytearray(self.c_struct)
        y = bytearray()
        for v in self._crc_table:
            y += struct.pack("<H", v)
        return x + y

Is this a good way of doing this? I don't want to go too deep down the rabbit hole just to find out that there was a better way of doing it.

Caveat: I'm working with Python 2 (please don't ask...), so a Python 3-only solution wouldn't be useful for me, but would be useful for the rest of the universe.

Cheers!

Not sure if this helps, but have you looked into the [struct](https://docs.python.org/2.7/library/struct.html) standard library in Python (available for 2.7)? Check out the "Classes" section at the bottom. — Ziyad Edher, Sep 11 '20 at 20:09
Thanks for the answer! Yes, I did, but the formatting syntax is kinda awful, I'm trying to be as pythonic as possible. — Leonardo, Sep 11 '20 at 20:16
Have you looked at [Cap'n Proto](https://github.com/capnproto/pycapnp)? It's a variant of protocol buffers which is designed to be less CPU intensive to serialize/deserialize. — Nick ODell, Sep 11 '20 at 20:24
"On the C side, I operate on that structure using a pointer cast to a memory buffer" -- thus obtaining undefined behavior. You might get away with it, but strict-aliasing violations such as that seem to elicit unwanted behavior more and more often as compilers get cleverer and more aggressive at optimization. — John Bollinger, Sep 11 '20 at 21:45
@JohnBollinger pfft, just `alias gcc="gcc -fno-strict-aliasing"` :') — Marco Bonelli, Sep 11 '20 at 22:44
completely off-topic, but the relevant answer is here: https://stackoverflow.com/questions/98650/what-is-the-strict-aliasing-rule. thanks @marco! — Leonardo, Sep 13 '20 at 01:57

Mark Tolonen · Answer 1 · 2020-09-12T00:36:32.577

1

The struct module is really easy to use for this problem (Python 2 code):

>>> import struct
>>> a = 1
>>> b = 2
>>> table = [3,4]
>>> struct.pack('<LH{}H'.format(len(table)),a,b,*table)
'\x01\x00\x00\x00\x02\x00\x03\x00\x04\x00'

Use .format to insert the length of the 16-bit values in table, and *table to expand table into the correct number of arguments.

Doing this with ctypes is more complicated. This function declares a custom structure with the correct variable array size and populates it, then generates the byte string of the raw data bytes:

#!python2
from ctypes import *

def make_var_struct(a,b,table):
    class Struct(Structure):
        _pack_ = 1
        _fields_ = (('a',c_uint32),
                    ('b',c_uint16),
                    ('table',c_uint16 * len(table)))
    return Struct(a,b,(c_uint16*len(table))(*table))

s = make_var_struct(1,2,[3,4])
print(repr(''.join(buffer(s))))

Output:

'\x01\x00\x00\x00\x02\x00\x03\x00\x04\x00'

edited Sep 12 '20 at 00:36

answered Sep 12 '20 at 00:22

Mark Tolonen

166,664
26
169
251

That only works if the length of `table` is known when the object is created. – Leonardo Sep 13 '20 at 01:58
@Leonardo That’s generally the case when building this type of structure. You have to allocate the memory for the table. What’s your use case? – Mark Tolonen Sep 13 '20 at 02:17
In the real application `table` holds a CRC table for a variable number of data blocks (think an ELF file program headers). You're right that I could have a separate list with the CRCs and only when of all them have been calculated, create the packed structure. I thought of using the Python object to both hold the CRC table and return the binary representation of itself. – Leonardo Sep 14 '20 at 14:15

Python ctypes struct with flexible array member to bytearray

1 Answers1