0

I have binary data that is stored in a non-trivial format where the information 'chunks' are not a fixed size and are similar to packets. I am reading them dynamically using this function:

def unpack_bytes(stream: BytesIO, binary_format: str) -> tuple:
    size = struct.calcsize(binary_format)
    buf = stream.read(size)
    print(buf)
    return struct.unpack(binary_format, buf)

This function is called with the appropriate format as needed and the code that creates the stream and loops over it is as follows:

def parse_data_file(data_directory: str) -> Generator[CompressedFile]:
    with open(data_directory, 'rb') as packet_stream:
        while <EOF file logic here>:
            contents = parse_packet(packet_stream)
            contents = gzip.compress(data=contents, compresslevel=9)
            yield CompressedFile(filename=f"{uuid.uuid4()}.gz", datetime=datetime.now(),
                                 contents=contents)

CompressedFile is just a small dataclass to store the parse_packet extracts a single packet (as per the data spec) from the bin file and returns the contents. Since the packets don't have a fixed width I am wondering what the best way to stop the loop would be. The two options I know of are:

  1. Add some extra logic to unpack_bytes() to bubble up an EOF.
  2. Do some cursor-foo to save the EOF and check against it as it loops. I'd like to not manipulate the cursor directly if possible

Is there are more idomatic way to check EOF within parse_data_file?

The last call to parse_packet (and by extension the last call to unpack_bytes) will consume all the data and the cursor will be at the end when the next iteration of the loop begins. I'd like to take advantage of that state instead of adding EOF handling code all the way up from unpack_bytes or fiddling with the cursor directly.

martineau
  • 119,623
  • 25
  • 170
  • 301
moonman4
  • 308
  • 3
  • 12
  • Hi! Did you check this answer? https://stackoverflow.com/a/10140333/13658055 – pugi Jan 20 '22 at 23:53
  • You can use Python's built-in `iter()` function to dynamically create an iterator object as shown in its [documentation](https://docs.python.org/3/library/functions.html#iter). It can be used with structs as shown in this [answer](https://stackoverflow.com/a/14216741/355230) of mine to another question. – martineau Jan 21 '22 at 00:01

0 Answers0