4

I am using this nice little package "construct" for binary data parsing. However, I ran into a case where the format is defined as:

31     24 23              0
+-------------------------+
| status |  an int number |
+-------------------------+

Basically, the higher 8 bits are used for status, and 3 bytes left for integer: an int type with higher bits masked off. I am a bit lost on what is the proper way of defining the format:

  • The brute force way is to define it as ULInt32 and do bit masking myself
  • Is there anyway I can use BitStruct to save the trouble?

Edit

Assuming Little Endian and based on jterrace's example and swapped=True suggestion, I think this is what will work in my case:

sample = "\xff\x01\x01\x01"
c = BitStruct("foo", BitField("i", 24, swapped=True), BitField("status", 8))
c.parse(sample)
Container({'i': 66047, 'status': 1})

Thanks

Oliver

Oliver
  • 3,592
  • 8
  • 34
  • 37
  • 1
    What is wrong with using integer operations for a format defined on low-level integers? `val >> 24` and `val & 0xfff` are pretty well readable. – Has QUIT--Anony-Mousse Feb 02 '12 at 21:15
  • @Anony-Mousse I'm guessing it's not that simple. This might be inside of a struct or an array, and he's already using construct to parse out those datastructures. – jterrace Feb 02 '12 at 21:39

2 Answers2

2

This would be easy if construct contained Int24 types, but it doesn't. Instead, you can specify the bit lengths yourself like this:

>>> from construct import BitStruct, BitField
>>> sample = "\xff\x01\x01\x01"
>>> c = BitStruct("foo", BitField("status", 8), BitField("i", 24))
>>> c.parse(sample)
Container({'status': 255, 'i': 65793})

Note: The value \x01\x01\x01 is 65536 + 256 + 1 = 65793

jterrace
  • 64,866
  • 22
  • 157
  • 202
  • I think I am confused somewhere on the bit order: when you write "\xff\x01\x01\x01", is \xff the high byte? [24-32)? For some reason, I always thought this should be 24-bit, 8-bit in that order. However, since both answers defined that way ... I am thinking I got it backward, can you clarify? thanks – Oliver Feb 03 '12 at 03:02
  • @Oliver it depends whether it's big-endian or little-endian. What format is your data in? Can you give an example of one of your byte strings? – jterrace Feb 03 '12 at 03:08
  • I know I am using little endian for sure, but I don't have a specific byte string as example. I did a few test, \xff seems belong to bit 0-7. – Oliver Feb 03 '12 at 03:17
  • @Oliver Just switch the positions in BitStruct if necessary, and you might need to pass ``swapped=True`` to ``BitField`` constructor to reverse the endianness. – jterrace Feb 03 '12 at 03:32
0
BitStruct("foo",
          BitField("status", 8),
          BitField("number", 24))
Ricardo Cárdenes
  • 9,004
  • 1
  • 21
  • 34