3

I'm writing a Python application that will write a binary file. This file will be parsed by some C code running on an embedded target.

I'm confident that I could do this by deriving from the Struct class, but the packing formatting is awful, and all my struct as little-endian anyways, so I thought of using the ctypes package.

Let's say that I have the following C structure:

struct my_c_struct
{
    uint32_t    a;
    uint16_t    b;
    uint16_t    table[];
};

On the C side, I operate on that structure using a pointer cast to a memory buffer, so I can do:

uint8_t buf[128];
struct my_c_struct *p = (struct my_c_struct*) buf;
p->table[0] = 0xBEEF;

How to best represent this in Python? My first go at it is:

class MyCStruct(ctypes.LittleEndianStructure):

    c_uint32 = ctypes.c_uint32
    c_uint16 = ctypes.c_uint16
    
    _pack_ = 1

    _fields_ = [
        ("a", c_uint32),
        ("b", c_uint16),
    ]

    def __init__(self, a, b):
        """
        Constructor
        """
        super(ctypes.LittleEndianStructure, self).__init__(a, b)
        self.table = []

    def pack(self):
        data = bytearray(self.table)
        return bytearray(self)+data

The idea behind the pack() method is that it'll assemble the variable-length table at the end of the structure. Mind that I don't know how many entries table has at object creation.

The way I implemented it obviously doesn't work. So I was thinking about nesting the ctypes-devived class in a pure Python class:

class MyCStruct:

    class my_c_struct(ctypes.LittleEndianStructure):
        _pack_ = 1
        _fields_ = [ ("a", ctypes.c_uint32),
                     ("b", ctypes.c_uint16) ]


    def __init__(self, a, b):
        """
        Constructor
        """
        self.c_struct = self.my_c_struct(a,b)
        self.table = []
    
    def pack(self):
        self.c_struct.b = len(self.table)
        x = bytearray(self.c_struct)
        y = bytearray()
        for v in self._crc_table:
            y += struct.pack("<H", v)
        return x + y

Is this a good way of doing this? I don't want to go too deep down the rabbit hole just to find out that there was a better way of doing it.

Caveat: I'm working with Python 2 (please don't ask...), so a Python 3-only solution wouldn't be useful for me, but would be useful for the rest of the universe.

Cheers!

Leonardo
  • 1,533
  • 17
  • 28
  • Not sure if this helps, but have you looked into the [struct](https://docs.python.org/2.7/library/struct.html) standard library in Python (available for 2.7)? Check out the "Classes" section at the bottom. – Ziyad Edher Sep 11 '20 at 20:09
  • Thanks for the answer! Yes, I did, but the formatting syntax is kinda awful, I'm trying to be as pythonic as possible. – Leonardo Sep 11 '20 at 20:16
  • Have you looked at [Cap'n Proto](https://github.com/capnproto/pycapnp)? It's a variant of protocol buffers which is designed to be less CPU intensive to serialize/deserialize. – Nick ODell Sep 11 '20 at 20:24
  • 1
    "On the C side, I operate on that structure using a pointer cast to a memory buffer" -- thus obtaining undefined behavior. You might get away with it, but strict-aliasing violations such as that seem to elicit unwanted behavior more and more often as compilers get cleverer and more aggressive at optimization. – John Bollinger Sep 11 '20 at 21:45
  • 1
    @JohnBollinger pfft, just `alias gcc="gcc -fno-strict-aliasing"` :') – Marco Bonelli Sep 11 '20 at 22:44
  • completely off-topic, but the relevant answer is here: https://stackoverflow.com/questions/98650/what-is-the-strict-aliasing-rule. thanks @marco! – Leonardo Sep 13 '20 at 01:57

1 Answers1

1

The struct module is really easy to use for this problem (Python 2 code):

>>> import struct
>>> a = 1
>>> b = 2
>>> table = [3,4]
>>> struct.pack('<LH{}H'.format(len(table)),a,b,*table)
'\x01\x00\x00\x00\x02\x00\x03\x00\x04\x00'

Use .format to insert the length of the 16-bit values in table, and *table to expand table into the correct number of arguments.

Doing this with ctypes is more complicated. This function declares a custom structure with the correct variable array size and populates it, then generates the byte string of the raw data bytes:

#!python2
from ctypes import *

def make_var_struct(a,b,table):
    class Struct(Structure):
        _pack_ = 1
        _fields_ = (('a',c_uint32),
                    ('b',c_uint16),
                    ('table',c_uint16 * len(table)))
    return Struct(a,b,(c_uint16*len(table))(*table))

s = make_var_struct(1,2,[3,4])
print(repr(''.join(buffer(s))))

Output:

'\x01\x00\x00\x00\x02\x00\x03\x00\x04\x00'
Mark Tolonen
  • 166,664
  • 26
  • 169
  • 251
  • That only works if the length of `table` is known when the object is created. – Leonardo Sep 13 '20 at 01:58
  • @Leonardo That’s generally the case when building this type of structure. You have to allocate the memory for the table. What’s your use case? – Mark Tolonen Sep 13 '20 at 02:17
  • In the real application `table` holds a CRC table for a variable number of data blocks (think an ELF file program headers). You're right that I could have a separate list with the CRCs and only when of all them have been calculated, create the packed structure. I thought of using the Python object to both hold the CRC table and return the binary representation of itself. – Leonardo Sep 14 '20 at 14:15