2

I have a binary string formatted like so:

  • (short) num_tables
  • (short) table_width
  • (short) table_height
  • Tables:
    • (3 * width * height * short) table_data
    • ... repeated num_tables times

Currently I'm parsing this out with this really ugly mess:

def decode_table_data(input_bytes):
    # Data is little-endian
    num_tables = input_bytes[0] + (input_bytes[1] << 8)
    table_width = input_bytes[2] + (input_bytes[3] << 8)
    table_height = input_bytes[4] + (input_bytes[5] << 8)

    # TODO: Extract table_data

This is obviously hard to read, ugly, takes a while to type out, and prone to errors. I would prefer a syntax like:

def decode_table_data(input_bytes):
    num_tables = input_bytes.read_short(little_endian=True)
    table_width = input_bytes.read_short(little_endian=True)
    table_height = input_bytes.read_short(little_endian=True)

I know many languages have tools for reading byte arrays like this (read_short, read_int, etc). Is there such a tool in Python? I tried Googling around for it but couldn't find anything easily.

stevendesu
  • 15,753
  • 22
  • 105
  • 182
  • 3
    You may be looking for [the `struct` built-in module](https://docs.python.org/3/library/struct.html#module-struct), or maybe structs provided by [`ctypes`](https://docs.python.org/3/library/ctypes.html#structures-and-unions) – ForceBru Dec 06 '19 at 14:56
  • Does this answer your question? [Reading a binary file into a struct](https://stackoverflow.com/questions/14215715/reading-a-binary-file-into-a-struct) – mkrieger1 Dec 06 '19 at 15:01

1 Answers1

2

You are looking for the struct module.

import struct


def decode_table_data(input_bytes):
   header = struct.Struct("<HHH")
   num_tables, table_width, table_height = header.unpack_from(input_bytes)

   table_size = 3 * table_width * table_height
   offset = += header.size

   for _ in range(num_tables):
       table_data = struct.unpack_from(f"{table_size}B", input_bytes, offset)
       # Do something with table_data
       offset += table_size
chepner
  • 497,756
  • 71
  • 530
  • 681
  • This is definitely a huge improvement on my code, but doesn't easily address the `(3 * width * height * sizeof(short))` bit when it comes to table data. In this case should I just write a for-loop and repeatedly decode `input_bytes[i : i+2]` or is there a way to define matrices in the struct format string? I skimmed the documentation but the closest thing I saw was `char[]` – stevendesu Dec 06 '19 at 15:03
  • Updated with a rough sketch of using `struct` to extract each table from the input, though you can do more if you know the exact structure of your matrix. (For example, is each cell just 6 raw bytes, or is it 3 shorts, or a long and a short, or something else?) – chepner Dec 06 '19 at 15:11
  • Each table cell consists of 3 shorts. Using your example I just wrote a nested for loop using `for t in range(0, num_tables)`, `for y in range(0, height)`, and `for x in range(0, width)` and extracted the 3 shorts using `" – stevendesu Dec 06 '19 at 15:58