13

I'm writing a parser for a binary format. This binary format involves different tables which are again in binary format containing varying field sizes usually (somewhere between 50 - 100 of them).

Most of these structures will have bitfields and will look something like these when represented in C:

struct myHeader
{
  unsigned char fieldA : 3
  unsigned char fieldB : 2;
  unsigned char fieldC : 3;
  unsigned short fieldD : 14;
  unsigned char fieldE : 4
}

I came across the struct module but realized that its lowest resolution was a byte and not a bit, otherwise the module pretty much was the right fit for this work.

I know bitfields are supported using ctypes, but I'm not sure how to interface ctypes structs containing bitfields here.

My other option is to manipulate the bits myself and feed it into bytes and use it with the struct module - but since I have close to 50-100 different types of such structures, writing the code for that becomes more error-prone. I'm also worried about efficiency since this tool might be used to parse large gigabytes of binary data.

Thanks.

Tuxdude
  • 47,485
  • 15
  • 109
  • 110
  • there are also 3rd party bit array / bit manipulation libraries. – agf Aug 25 '11 at 23:34
  • It would be a fair amount of work, but you could probably design a class that could parse C-style structure definitions (or something similar to them that eliminated packing ambiguity) into a set of masks for each bitfield, read the data in via the struct module to get to the byte level, and offer `__getattr__` access. – Russell Borogove Aug 26 '11 at 00:17
  • Yes I now came across these tools - [python-bitstring](http://code.google.com/p/python-bitstring/), [Construct](http://construct.wikispaces.com/tut-basics), [BitReader](https://bitbucket.org/jtoivola/bitreader/wiki/Home) - and reading through their docs. Bit Reader seems like a viable solution but I see [here](http://blog.mfabrik.com/2010/09/08/bitreader-python-module-for-reading-bits-from-bytes/) that the performance is gonna be a big hit. Construct as far as I could find from their basic documentation doesnt support bit fields. Python-bitstring sounds promising and need to dig in bit deeper – Tuxdude Aug 26 '11 at 00:23
  • yes Russell that is my last alternative as of now - something like a higher level abstraction to support bitfields with the struct module. – Tuxdude Aug 26 '11 at 00:25

2 Answers2

7

Using bitstring (which you mention you're looking at) it should be easy enough to implement. First to create some data to decode:

>>> myheader = "3, 2, 3, 14, 4"
>>> a = bitstring.pack(myheader, 1, 0, 5, 1000, 2)
>>> a.bin
'00100101000011111010000010'
>>> a.tobytes()
'%\x0f\xa0\x80'

And then decoding it again is just

>>> a.readlist(myheader)
[1, 0, 5, 1000, 2]

Your main concern might well be the speed. The library is well optimised Python, but that's not nearly as fast as a C library would be.

technogeek1995
  • 3,185
  • 2
  • 31
  • 52
Scott Griffiths
  • 21,438
  • 8
  • 55
  • 85
  • Thanks Scott - yes I've checked your bitstring library and it comes very close to my requirements indeed. In fact I posted the question in the mailing list [here](http://groups.google.com/group/python-bitstring/browse_thread/thread/2d85a909aab9d818?tvc=2). I can understand it can be read as a list - but I'd like to preferably use a dictionary just for the convenience of code readability since the structs I'll be dealing with would have more than 20 or 30 fields easily. I know it is supported in pack, but would like to know how to use it with unpack since that will be the primary functionality. – Tuxdude Aug 27 '11 at 06:58
  • @Ash: You can't unpack to a dictionary just yet. I think you need something like the decode method proposed [here](http://code.google.com/p/python-bitstring/wiki/EncodeDecode), which hasn't been done partly because what I'd really like to return is an ordered dictionary - I'm not sure that an unordered dictionary would be that useful. I'll think about it some more though... – Scott Griffiths Aug 27 '11 at 10:08
  • yes it makes sense to return an ordered dictionary but I guess it's support is present directly only in Python 3.3a0 (or at least based on what the page says [here- PEP372](http://docs.python.org/dev/whatsnew/2.7.html) – Tuxdude Aug 27 '11 at 20:25
6

I haven't rigorously tested this, but it seems to work with unsigned types (edit: it works with signed byte/short types, too).

Edit 2: This is really hit or miss. It depends on the way the library's compiler packed the bits into the struct, which is not standardized. For example, with gcc 4.5.3 it works as long as I don't use the attribute to pack the struct, i.e. __attribute__ ((__packed__)) (so instead of 6 bytes it gets packed into 4 bytes, which you can check with __alignof__ and sizeof). I can make it almost work by adding _pack_ = True to the ctypes Structure definition, but it fails for fieldE. gcc notes: "Offset of packed bit-field ‘fieldE’ has changed in GCC 4.4".

import ctypes

class MyHeader(ctypes.Structure):
    _fields_ = [
        ('fieldA', ctypes.c_ubyte, 3),
        ('fieldB', ctypes.c_ubyte, 2),
        ('fieldC', ctypes.c_ubyte, 3),
        ('fieldD', ctypes.c_ushort, 14),
        ('fieldE', ctypes.c_ubyte, 4),
    ]

lib = ctypes.cdll.LoadLibrary('C/bitfield.dll')

hdr = MyHeader()
lib.set_header(ctypes.byref(hdr))

for x in hdr._fields_:
    print("%s: %d" % (x[0], getattr(hdr, x[0])))

Output:

fieldA: 3
fieldB: 1
fieldC: 5
fieldD: 12345
fieldE: 9

C:

typedef struct _MyHeader {
    unsigned char  fieldA  :  3;
    unsigned char  fieldB  :  2;
    unsigned char  fieldC  :  3;
    unsigned short fieldD  : 14;
    unsigned char  fieldE  :  4;
} MyHeader, *pMyHeader; 

int set_header(pMyHeader hdr) {

    hdr->fieldA = 3;
    hdr->fieldB = 1;
    hdr->fieldC = 5;
    hdr->fieldD = 12345;
    hdr->fieldE = 9;

    return(0);
}
Eryk Sun
  • 33,190
  • 5
  • 92
  • 111
  • See a tested example without the need for any C code or dlls at all at [Does Python have a bitfield type?](http://stackoverflow.com/a/11481471/507544) – nealmcb Jul 14 '12 at 06:13
  • @nealmcb - Your example represents a way to store such data within Python itself. But how do you import or export such data from/to a stream of bytes that can be read/written to disk or may be recvd/sent over network ? – Tuxdude Jul 15 '12 at 06:33
  • @ash That is what the union is for, and the `flags.asbyte` field in that example. Thanks for pointing out that it wasn't so clear. I've polished the text there to make it a bit more clear. Heh :) – nealmcb Jul 17 '12 at 13:55